1
0:0:0,099999994 --> 0:0:2,1399999
Nikolay: Hello, hello, this is Postgres FM.

2
0:0:2,48 --> 0:0:4,46
I'm Nik, PostgresAI.

3
0:0:4,9 --> 0:0:8,08
And as usual, here, Michael, pgMustard.

4
0:0:8,08 --> 0:0:8,86
Hi, Michael.

5
0:0:9,66 --> 0:0:10,44
Michael: Hi, Nik.

6
0:0:11,4 --> 0:0:14,42
Nikolay: I proposed this topic because I thought we have a gap

7
0:0:14,42 --> 0:0:14,92
here.

8
0:0:16,64 --> 0:0:20,279999
Before we started recording, you said we have already episode

9
0:0:20,46 --> 0:0:24,06
about transaction_timeout, but it's very related but it's not

10
0:0:24,06 --> 0:0:24,52
the same.

11
0:0:24,52 --> 0:0:28,4
So I wanted to dive deep into long-running transactions.

12
0:0:28,619999 --> 0:0:29,56
What do you think?

13
0:0:30,22 --> 0:0:33,94
Michael: Yeah, I was surprised we didn't have an episode on it

14
0:0:33,94 --> 0:0:35,04
specifically already.

15
0:0:35,46 --> 0:0:40,08
We had a first ever episode, didn't we, on slow queries and I

16
0:0:40,08 --> 0:0:43,02
guess long transactions as part of that.

17
0:0:43,52 --> 0:0:45,46
But we hadn't done a specific episode on it.

18
0:0:45,46 --> 0:0:49,94
And I guess as a long-running podcast maybe it's definitely time

19
0:0:49,94 --> 0:0:50,7
to revisit.

20
0:0:51,28 --> 0:0:51,78
Yeah,

21
0:0:52,54 --> 0:0:54,559998
Nikolay: sometimes idle in podcasting.

22
0:0:56,6 --> 0:1:0,14
So I would like to challenge us and think by the end of this

23
0:1:0,14 --> 0:1:5,04
recording, we record in advance as usual because some preparation

24
0:1:5,28 --> 0:1:9,12
is needed, post-processing is needed, thank you for doing this

25
0:1:9,24 --> 0:1:10,1
by the way.

26
0:1:10,479996 --> 0:1:14,3
So I would like to challenge us and think should this episode

27
0:1:14,3 --> 0:1:18,02
be called long-running transactions or somehow else?

28
0:1:18,46 --> 0:1:22,0
Because you already understand what I'm talking about, right?

29
0:1:22,6 --> 0:1:26,76
Michael: Only because of our pre-discussion I got a good hint

30
0:1:26,76 --> 0:1:31,56
from you. But is it more that the problems they cause that you

31
0:1:31,56 --> 0:1:35,38
want to talk about and therefore maybe it's only a subset of

32
0:1:35,38 --> 0:1:38,86
long-running transactions that are an issue.

33
0:1:39,18 --> 0:1:42,38
Nikolay: So listeners who listen to this already see the decision.

34
0:1:42,78 --> 0:1:48,26
Did we preserve a long-running transaction title originally or

35
0:1:48,26 --> 0:1:50,08
we decided to change it?

36
0:1:50,08 --> 0:1:52,62
But the problem in my head is simple.

37
0:1:53,36 --> 0:1:58,04
I see in monitoring tools, in blog posts, and everywhere, like

38
0:1:58,32 --> 0:2:3,36
everywhere actually, that long-running transactions are bad.

39
0:2:3,9 --> 0:2:7,2
And at the same time, I remember, actually it was an interesting

40
0:2:7,2 --> 0:2:13,2
moment, I had an experimental educational course where I explained

41
0:2:13,26 --> 0:2:17,7
various concepts of database health, and we touched this topic,

42
0:2:17,72 --> 0:2:22,76
how long-running transactions are harmful to autovacuum and I said

43
0:2:22,76 --> 0:2:25,84
you just open transaction and it will be blocking autovacuum and

44
0:2:25,84 --> 0:2:30,14
I said it's so easy to demonstrate let's open a couple of psql

45
0:2:30,14 --> 0:2:33,76
sessions or something like yeah psql vacuum verbose and we will

46
0:2:33,76 --> 0:2:37,86
see it has dead tuples but cannot clean up them because there

47
0:2:37,86 --> 0:2:39,66
is a long-running transaction in progress.

48
0:2:40,04 --> 0:2:44,14
And I just wrote begin and it didn't work.

49
0:2:44,76 --> 0:2:46,78
It Cleaned up tuples perfectly.

50
0:2:47,12 --> 0:2:51,54
I think I also tried again with begin and SELECT, doing some

51
0:2:51,54 --> 0:2:55,68
select and keeping transaction
open, being in idle transaction

52
0:2:55,68 --> 0:2:57,54
state according to pg_stat_activity.

53
0:2:57,84 --> 0:2:59,28
And again, it didn't work.

54
0:3:0,04 --> 0:3:0,98
And I looked stupid.

55
0:3:3,68 --> 0:3:8,36
And the reason I look stupid is
actually because we have actually

56
0:3:8,36 --> 0:3:11,18
2 problems and we constantly mix
them.

57
0:3:12,18 --> 0:3:16,16
And saying long running transactions
block vacuum is wrong.

58
0:3:16,68 --> 0:3:18,46
Not all of them block vacuum.

59
0:3:19,22 --> 0:3:20,64
Some of them block vacuum.

60
0:3:20,9 --> 0:3:27,88
And we have, in my mind, we should
improve tooling and materials

61
0:3:28,18 --> 0:3:32,92
and everywhere understanding and
start distinguishing 2 problems.

62
0:3:33,7 --> 0:3:36,42
1 is lock-related and 1 is vacuum-related.

63
0:3:37,2 --> 0:3:43,26
We should change monitoring tools
and snippets, everything, to

64
0:3:43,26 --> 0:3:44,72
hunt 2 problems separately.

65
0:3:46,18 --> 0:3:47,54
Diagnose them separately.

66
0:3:47,98 --> 0:3:51,42
Michael: Yeah, should we go back
to the basics though, in terms

67
0:3:51,42 --> 0:3:55,12
of, maybe let's first cover the
long-running transactions that

68
0:3:55,12 --> 0:4:0,86
are harmful, or like can be harmful,
and then reverse our way

69
0:4:0,86 --> 0:4:3,98
backwards to the ones that aren't
and therefore trying to distinguish

70
0:4:4,08 --> 0:4:4,98
between them.

71
0:4:5,66 --> 0:4:9,3
Nikolay: Some people say long transactions,
which I think is

72
0:4:9,34 --> 0:4:15,4
oversimplification because it may
be very long but fast, or it

73
0:4:15,4 --> 0:4:17,92
may be very short but very long
running.

74
0:4:17,92 --> 0:4:21,62
Michael: Wait, what does long mean
if it's not just about the

75
0:4:21,62 --> 0:4:22,12
time?

76
0:4:22,36 --> 0:4:23,44
Nikolay: A lot of text.

77
0:4:25,08 --> 0:4:26,26
Michael: Oh, sure.

78
0:4:26,4 --> 0:4:26,9
OK.

79
0:4:27,18 --> 0:4:29,12
I didn't even think about that.

80
0:4:30,42 --> 0:4:33,54
Nikolay: Like, I'm not a person
who tries to be pedantic.

81
0:4:35,38 --> 0:4:40,06
I gave up, for example, my scientific
mentor was distinguishing

82
0:4:41,74 --> 0:4:44,96
database systems and database and
saying you cannot say Postgres

83
0:4:44,96 --> 0:4:48,4
is database, you must say Postgres
is database system or database

84
0:4:48,4 --> 0:4:49,32
management system.

85
0:4:49,56 --> 0:4:50,66
I already gave up.

86
0:4:50,66 --> 0:4:52,48
You cannot fight with old people.

87
0:4:52,48 --> 0:4:55,9
They will say it's a good database,
a meaning database system.

88
0:4:55,9 --> 0:4:57,54
So it's okay to say long transactions.

89
0:4:57,74 --> 0:5:2,5
It's not okay for my texts I write
and like products we build.

90
0:5:2,78 --> 0:5:6,98
There we always pursue precision
and we say long-running transactions

91
0:5:7,66 --> 0:5:12,08
right but at a very high level
if you have long-running transactions

92
0:5:12,1 --> 0:5:17,72
there are 2 dangers at least actually
there are maybe a few more

93
0:5:17,72 --> 0:5:23,74
but 2 main dangers are some locks
are acquired.

94
0:5:24,72 --> 0:5:27,74
And as we know, like this is a
simple rule to remember, we talked

95
0:5:27,74 --> 0:5:31,72
about this so many times locks
are released only at the end of

96
0:5:31,72 --> 0:5:32,22
transaction.

97
0:5:32,72 --> 0:5:36,58
They cannot be released before
rollback or commit, that's it.

98
0:5:36,58 --> 0:5:40,28
So it means that transaction, if
it lasts long, accumulates a

99
0:5:40,28 --> 0:5:45,02
lot of locks and even access share
locks can be harmful, right?

100
0:5:45,02 --> 0:5:45,1
Sometimes.

101
0:5:45,1 --> 0:5:45,6
Yeah,

102
0:5:46,4 --> 0:5:49,74
Michael: even if the statement that acquired the lock was short,

103
0:5:49,74 --> 0:5:52,1
if it's followed by statements that are slower, it's not...

104
0:5:52,1 --> 0:5:52,2
Nikolay: The

105
0:5:52,2 --> 0:5:53,84
Michael: lock is acquired, that's it.

106
0:5:53,84 --> 0:5:54,34
Exactly.

107
0:5:54,64 --> 0:5:57,04
Nikolay: The lock is held until the very end of the transaction.

108
0:5:57,04 --> 0:5:59,58
It might block someone like DDL.

109
0:5:59,76 --> 0:6:4,08
DDL might block other SELECTs, as we also discussed many times.

110
0:6:4,08 --> 0:6:9,4
And this is a big surprise, and like this two-step effect, which

111
0:6:9,4 --> 0:6:11,52
can put the whole server down, actually.

112
0:6:12,1 --> 0:6:13,08
This is a big deal.

113
0:6:13,08 --> 0:6:17,72
So if you just selected something and keep the transaction open

114
0:6:17,72 --> 0:6:21,8
for long, you just, for example, should avoid changing schema.

115
0:6:22,7 --> 0:6:24,44
Or you have a dump happening.

116
0:6:25,16 --> 0:6:26,24
Very common situation.

117
0:6:26,68 --> 0:6:31,82
There's a dump happening or something like initialization of

118
0:6:31,82 --> 0:6:34,64
logical replica, which lasts a couple of hours, for example.

119
0:6:34,7 --> 0:6:37,2
And at the same time, we have deployment trying to change the

120
0:6:37,2 --> 0:6:40,44
DDL, trying to change schema, right?

121
0:6:40,44 --> 0:6:44,62
And yeah, this is a very dangerous situation because if you don't

122
0:6:44,62 --> 0:6:48,98
have lock_timeout, retries, we talked about it as well a lot.

123
0:6:49,18 --> 0:6:51,0
In these cases can cause downtime.

124
0:6:51,02 --> 0:6:54,52
So lock acquisition plus holding these locks for long.

125
0:6:54,52 --> 0:6:56,5
This is a danger number 1.

126
0:6:56,6 --> 0:7:1,06
And danger number 2 is how MVCC in Postgres is organized.

127
0:7:1,74 --> 0:7:6,0
MVCC is a multi-version concurrency control for like new folks.

128
0:7:6,0 --> 0:7:9,72
I know we have new folks all the time and specifically vacuum

129
0:7:9,72 --> 0:7:10,16
problem.

130
0:7:10,16 --> 0:7:16,4
So to clean up dead tuples promptly, server should be in good

131
0:7:16,4 --> 0:7:16,78
state.

132
0:7:16,78 --> 0:7:20,28
And if you keep the transaction long, in some cases, not in all,

133
0:7:20,28 --> 0:7:25,92
as we will discuss now, your long-running transaction might block,

134
0:7:25,92 --> 0:7:27,48
not might block, it will block.

135
0:7:27,52 --> 0:7:31,64
If, for example, you inserted a row and just keep the transaction

136
0:7:31,64 --> 0:7:34,74
open, This is a real transaction.

137
0:7:34,9 --> 0:7:37,8
It has real transaction ID, right?

138
0:7:38,22 --> 0:7:44,36
This is definitely will block vacuum and it will need to skip

139
0:7:44,58 --> 0:7:49,36
the tuples which became dead recently after your transaction

140
0:7:49,4 --> 0:7:49,8
started.

141
0:7:49,8 --> 0:7:53,3
Because, and this is global, this is biggest surprise here for

142
0:7:53,3 --> 0:7:56,66
people, as I see, to realize this is global.

143
0:7:56,72 --> 0:8:0,78
You insert it to 1 table and keep the transaction open.

144
0:8:1,26 --> 0:8:3,96
But vacuum cannot clean any table.

145
0:8:4,1 --> 0:8:7,76
It can clean only dead tuples which became dead before, because

146
0:8:7,76 --> 0:8:12,54
xmax, this is a hidden Column in every row, you can SELECT xmax

147
0:8:12,54 --> 0:8:13,64
from any table.

148
0:8:13,92 --> 0:8:18,04
Xmax for dead rows is filled with a transaction which deleted

149
0:8:18,04 --> 0:8:21,44
this row, with UPDATE or DELETE, it deleted this row version.

150
0:8:21,98 --> 0:8:25,16
And if this xmax is in the past compared to your transaction

151
0:8:25,24 --> 0:8:29,18
ID, which holds this snapshot, right?

152
0:8:30,06 --> 0:8:33,58
This will be possible to delete, but if it's later, Postgres

153
0:8:33,6 --> 0:8:36,22
thinks maybe your transaction will need it.

154
0:8:36,66 --> 0:8:39,38
Michael: Yeah, it could still, if that transaction wants to read

155
0:8:39,38 --> 0:8:43,58
that row, let's say the 1 that got updated or deleted, it needs

156
0:8:43,58 --> 0:8:45,92
to return the old version.

157
0:8:46,52 --> 0:8:46,74
Yeah.

158
0:8:46,74 --> 0:8:49,9
According to its snapshot, so it's still needed, or theoretically,

159
0:8:50,86 --> 0:8:51,64
still needed.

160
0:8:52,28 --> 0:8:53,46
It's a feature, right?

161
0:8:53,6 --> 0:8:56,82
Nikolay: It's a feature, but also we need to think about transaction

162
0:8:56,92 --> 0:8:58,52
isolation levels here.

163
0:8:58,98 --> 0:8:59,48
Michael: True.

164
0:9:0,06 --> 0:9:4,66
Nikolay: Because by default, so if you have a transaction ID,

165
0:9:5,06 --> 0:9:9,38
so basically in pg_stat_activity, you can see the column

166
0:9:9,38 --> 0:9:15,26
backend_xmin, which defines this horizon that your transaction needs,

167
0:9:15,4 --> 0:9:17,28
this snapshot, it defines this snapshot.

168
0:9:18,42 --> 0:9:22,82
And if you have repeatable read, which is not repeatable read,

169
0:9:22,82 --> 0:9:24,72
it's already higher isolation level.

170
0:9:24,72 --> 0:9:26,02
It's more like it's stricter.

171
0:9:26,64 --> 0:9:28,44
It's snapshot isolation, in fact.

172
0:9:28,44 --> 0:9:35,34
And it's not default, but it's used during things like dumping.

173
0:9:35,38 --> 0:9:37,36
pg_dump will use a repeatable read.

174
0:9:37,36 --> 0:9:39,26
It opens transaction in a repeatable read.

175
0:9:39,44 --> 0:9:42,14
And also logical replication initialization.

176
0:9:43,52 --> 0:9:47,02
And sometimes in multiple threads, multiple connections.

177
0:9:47,68 --> 0:9:51,26
So in this case, this snapshot needs to be held because this

178
0:9:51,26 --> 0:9:55,24
transaction which is running in repeatable read needs this snapshot

179
0:9:55,24 --> 0:9:55,94
for consistency.

180
0:9:56,26 --> 0:9:59,18
All Queries will need to read from this very snapshot.

181
0:9:59,48 --> 0:10:4,38
So all tables you read during dumping must be from the same moment,

182
0:10:4,4 --> 0:10:4,9
right?

183
0:10:6,18 --> 0:10:12,18
While in traditional, like, default transaction isolation, which

184
0:10:12,18 --> 0:10:15,46
is read committed, it's shifting.

185
0:10:15,94 --> 0:10:18,08
Snapshot is shifting, so it's not being held.

186
0:10:18,08 --> 0:10:21,92
And if you, for example, open transaction, SELECT from 1 table,

187
0:10:22,42 --> 0:10:24,96
SELECT from another table, SELECT from another table, you will

188
0:10:24,96 --> 0:10:28,82
see that this backend_xmin is shifting, right?

189
0:10:29,16 --> 0:10:32,8
Which means that our horizon is shifting and we're not blocking

190
0:10:33,82 --> 0:10:39,96
a vacuum like we could block if we are dumping or just using

191
0:10:39,96 --> 0:10:41,68
transaction at repeatable rate.

192
0:10:42,18 --> 0:10:47,58
Which means that not all, not all transaction, long-running transactions

193
0:10:47,64 --> 0:10:49,0
are harmful to vacuum.

194
0:10:49,82 --> 0:10:50,2
Right?

195
0:10:50,2 --> 0:10:51,14
Does it make sense?

196
0:10:51,82 --> 0:10:53,76
Michael: Yeah, at least from a Bloat perspective.

197
0:10:54,4 --> 0:10:58,48
So the, actually, probably from everything in terms of vacuum,

198
0:10:58,48 --> 0:11:2,2
I was thinking there might be an exception around like transaction

199
0:11:2,26 --> 0:11:5,98
ID wraparound, for example, if we had a very long running transaction,

200
0:11:5,98 --> 0:11:9,72
but if it didn't have a transaction ID, then actually that's

201
0:11:9,72 --> 0:11:12,84
not gonna be an issue for a transaction
ID wraparound either.

202
0:11:13,34 --> 0:11:16,98
Nikolay: That's an important distinction
because if we inserted

203
0:11:17,12 --> 0:11:22,08
or we ran XID current function,
or there's a new version, I'm

204
0:11:22,08 --> 0:11:23,86
constantly forgetting, I'm old.

205
0:11:24,38 --> 0:11:26,46
So you have allocated real XID.

206
0:11:27,18 --> 0:11:30,4
It's 1 story, but if you're just
reading and you don't have,

207
0:11:30,4 --> 0:11:34,08
your transaction doesn't have allocated
XID, it has only basically,

208
0:11:34,16 --> 0:11:37,64
there is concept of virtual XID,
which is optimization to save

209
0:11:37,64 --> 0:11:42,4
space because we have only 4 byte
transaction IDs, right?

210
0:11:42,4 --> 0:11:45,08
So it's like, it's not a lot these
days.

211
0:11:45,3 --> 0:11:49,3
And if you, like most of your queries,
most of your transactions

212
0:11:49,36 --> 0:11:53,5
are reading transactions, it would
be not wise to give them,

213
0:11:54,28 --> 0:11:56,54
to consume from this space, right?

214
0:11:56,54 --> 0:11:57,04
Yeah.

215
0:11:57,62 --> 0:12:2,6
So there is optimization with virtual
transaction ID and purely

216
0:12:3,32 --> 0:12:7,94
reading transactions, they only
consume from that space, right?

217
0:12:8,0 --> 0:12:9,26
Virtual transaction IDs.

218
0:12:9,4 --> 0:12:13,0
So we don't have it and which means
that you need to distinguish

219
0:12:13,08 --> 0:12:17,92
these topics and my point is that
not all long-running transactions

220
0:12:18,04 --> 0:12:18,74
are harmful.

221
0:12:19,18 --> 0:12:22,48
But at the same time, if you're
just reading, but this reading

222
0:12:22,48 --> 0:12:26,16
itself is still happening, you
are reading from a huge table,

223
0:12:26,16 --> 0:12:33,68
it lasts long, definitely backend_xmin
will be like some value

224
0:12:33,78 --> 0:12:35,94
not changing in pg_stat_activity.

225
0:12:36,9 --> 0:12:40,14
In this transaction, reading transaction
and read committed,

226
0:12:40,32 --> 0:12:41,34
it will be harmful.

227
0:12:41,52 --> 0:12:45,82
So I'm talking only about cases
when we read something And we

228
0:12:45,92 --> 0:12:49,44
either read something else constantly,
small reads, right?

229
0:12:49,44 --> 0:12:52,58
Or we read and sit in idle transaction
state, not doing anything

230
0:12:52,58 --> 0:12:54,12
and not closing this transaction.

231
0:12:55,08 --> 0:12:55,58
Michael: Yeah.

232
0:12:55,68 --> 0:12:58,04
Are you talking, is this a bit
of an edge case?

233
0:12:58,04 --> 0:13:0,48
Like who's actually doing this
a lot?

234
0:13:0,48 --> 0:13:1,98
Like when does it happen?

235
0:13:2,72 --> 0:13:3,58
Nikolay: What exactly?

236
0:13:4,22 --> 0:13:7,7
Michael: Like a small read and
it doesn't get committed.

237
0:13:7,72 --> 0:13:9,14
Nikolay: Oh it happens all the
time.

238
0:13:9,14 --> 0:13:11,68
The number 1 reason is humans.

239
0:13:13,04 --> 0:13:13,58
So they...

240
0:13:13,58 --> 0:13:15,36
Michael: Like opening an editor
or something.

241
0:13:15,64 --> 0:13:16,14
Nikolay: Yeah.

242
0:13:16,64 --> 0:13:22,32
In psql or even in more cases some
graphic interface, right?

243
0:13:22,7 --> 0:13:26,12
They begin some production, read
something and then they all

244
0:13:26,12 --> 0:13:28,4
maybe change something also, right?

245
0:13:28,94 --> 0:13:32,62
More like usually when you are
reading you are not opening transaction.

246
0:13:32,9 --> 0:13:36,04
Like when you're just exploring
stuff, you run some queries to

247
0:13:36,04 --> 0:13:37,66
understand the state of database.

248
0:13:38,68 --> 0:13:41,94
Usually you don't use begin-commit
transaction block.

249
0:13:42,18 --> 0:13:45,3
But when you try to write, sometimes
you do.

250
0:13:47,24 --> 0:13:51,64
And abandoned transaction from
humans, so common.

251
0:13:52,44 --> 0:13:53,4
It's very common.

252
0:13:53,4 --> 0:13:55,32
So just abandoned, forgot to close.

253
0:13:55,32 --> 0:13:59,02
In some cases, this is called turning off auto-commit, right?

254
0:13:59,18 --> 0:14:2,3
So you, in some editors, you can say, OK, I don't like auto-commit,

255
0:14:2,32 --> 0:14:3,4
I will turn it off.

256
0:14:3,4 --> 0:14:6,5
Then you start working but not committing, and then this is abandoned

257
0:14:6,5 --> 0:14:8,66
transaction, so it doesn't do anything.

258
0:14:9,28 --> 0:14:14,1
And if it holds this backend_xmin in this activity, this is

259
0:14:14,1 --> 0:14:15,88
what starts blocking autovacuum.

260
0:14:16,24 --> 0:14:18,62
And in first maybe few hours you won't notice.

261
0:14:19,12 --> 0:14:21,56
Michael: So yes, I understand the dangerous case.

262
0:14:21,56 --> 0:14:23,8
I'm talking about the non-dangerous case.

263
0:14:23,8 --> 0:14:27,1
The opening a transaction, begin, then a SELECT.

264
0:14:27,1 --> 0:14:30,88
You talked about just doing a read query, not doing an UPDATE.

265
0:14:31,64 --> 0:14:32,82
When would that happen?

266
0:14:33,48 --> 0:14:35,78
Nikolay: In the same situation, it might happen as well.

267
0:14:36,04 --> 0:14:40,06
I'm just like, I was trying to understand all the possible cases.

268
0:14:40,44 --> 0:14:41,13
In this case, it's not.

269
0:14:41,13 --> 0:14:43,7
Michael: Well, I think, yeah, but also, I just don't think it

270
0:14:43,7 --> 0:14:44,86
happens in the same way.

271
0:14:44,86 --> 0:14:48,96
Like, I understand doing begin, UPDATE, and then you've got the

272
0:14:48,96 --> 0:14:52,58
option to ROLLBACK if you're scared or you're traumatized maybe

273
0:14:52,58 --> 0:14:55,96
from doing an UPDATE once and then seeing you're expecting 1

274
0:14:55,96 --> 0:14:58,86
row updated and then you see 30,000 rows updated or something

275
0:14:58,86 --> 0:15:2,72
like that maybe You start using transactions explicitly so you

276
0:15:2,72 --> 0:15:3,48
can ROLLBACK to...

277
0:15:3,48 --> 0:15:5,14
Nikolay: I get what you mean, yeah.

278
0:15:5,66 --> 0:15:7,66
Michael: But who's doing that for SELECT queries?

279
0:15:7,66 --> 0:15:8,5
Like I don't...

280
0:15:8,68 --> 0:15:9,44
I don't know.

281
0:15:9,72 --> 0:15:12,04
Nikolay: Our purpose is to be complete, right?

282
0:15:12,04 --> 0:15:12,74
To cover all possible...

283
0:15:12,74 --> 0:15:13,58
Michael: No, I understand.

284
0:15:13,58 --> 0:15:16,36
So this is why I'm asking, Is this an edge case?

285
0:15:16,92 --> 0:15:20,64
Nikolay: Yeah, usually I think if I have a bunch of SELECTs and

286
0:15:20,64 --> 0:15:24,02
I combine them to transaction, I would probably use repeatable

287
0:15:24,02 --> 0:15:25,82
read, so all of them are consistent.

288
0:15:25,92 --> 0:15:27,26
That's why I combine them, right?

289
0:15:27,26 --> 0:15:30,14
This is like, basically like pg_dump does, right?

290
0:15:30,24 --> 0:15:33,94
If I don't, if I use transaction, but I don't use repeatable

291
0:15:33,94 --> 0:15:37,2
read, why do I use transaction in the first place?

292
0:15:37,54 --> 0:15:38,98
It's just a bunch of SELECTs.

293
0:15:39,92 --> 0:15:41,02
Maybe to...

294
0:15:41,52 --> 0:15:43,38
Just like what comes to my mind.

295
0:15:43,38 --> 0:15:47,12
Maybe to make sure schema won't be changed.

296
0:15:47,22 --> 0:15:48,24
To block DDL.

297
0:15:49,26 --> 0:15:51,04
Michael: Or to deal with the pooler maybe?

298
0:15:51,04 --> 0:15:53,64
Like to make sure you stay with the same.

299
0:15:54,84 --> 0:15:57,94
Nikolay: Yeah, maybe there might be some cases and yeah.

300
0:15:58,52 --> 0:16:3,9
Another very common situation when we see, so there's this term,

301
0:16:3,9 --> 0:16:6,0
xmin horizon, right?

302
0:16:6,26 --> 0:16:9,22
You can see it's in some places
in source code, you won't be

303
0:16:9,22 --> 0:16:12,28
able to see it, I think you won't
be able to see it in documentation.

304
0:16:12,4 --> 0:16:16,28
I checked long ago, I think it's
very like unofficial term.

305
0:16:16,38 --> 0:16:21,02
You can see it in some blog posts,
but again, I feel it's not

306
0:16:21,02 --> 0:16:22,96
fully understood topic, right?

307
0:16:22,96 --> 0:16:24,32
So what is xmin horizon?

308
0:16:25,24 --> 0:16:28,78
And I was trying to understand
like all aspects of it, how to

309
0:16:28,78 --> 0:16:29,54
properly measure.

310
0:16:29,54 --> 0:16:34,62
And my question is like, why is
not properly measured by monitoring

311
0:16:34,62 --> 0:16:34,92
tools?

312
0:16:34,92 --> 0:16:35,94
Because it's not.

313
0:16:36,6 --> 0:16:41,02
You like check all popular monitoring
tools, excluding PostgreSQL

314
0:16:41,18 --> 0:16:42,44
monitoring tool, right?

315
0:16:42,6 --> 0:16:45,44
And you won't be able to see like,
they like, they talk, okay,

316
0:16:45,44 --> 0:16:48,32
there is a long running transaction
lasting like this amount

317
0:16:48,32 --> 0:16:48,98
of time.

318
0:16:49,86 --> 0:16:54,96
Okay, have you excluded transactions
which are not harmful or

319
0:16:54,96 --> 0:16:55,84
you included them?

320
0:16:55,84 --> 0:16:57,72
What's the purpose of this chart?

321
0:16:57,72 --> 0:17:1,22
Are we talking about danger related
to locking or are we talking

322
0:17:1,22 --> 0:17:2,86
about danger related to vacuuming?

323
0:17:4,44 --> 0:17:4,94
Unclear.

324
0:17:5,34 --> 0:17:8,48
My point is, if we start distinguishing,
if we start thinking

325
0:17:8,48 --> 0:17:12,98
about xmin horizon, we need to
understand what it is, and how

326
0:17:12,98 --> 0:17:15,78
to measure it, how to properly
understand it.

327
0:17:15,78 --> 0:17:18,98
So I came up with the idea, we
have 4 sources.

328
0:17:19,74 --> 0:17:24,14
First is a pg_stat_activity,
backend_xmin, we just select

329
0:17:24,62 --> 0:17:25,52
the oldest.

330
0:17:26,66 --> 0:17:32,04
Actually, you can use function
age to backend_xmin, which will

331
0:17:32,04 --> 0:17:37,28
give you difference, just like
integer difference between your

332
0:17:37,28 --> 0:17:38,46
current function.

333
0:17:38,52 --> 0:17:42,76
Yeah, this function should understand
that we have wraparound

334
0:17:42,84 --> 0:17:44,2
of the space, right?

335
0:17:44,2 --> 0:17:48,04
So it will give you a number of
transactions, the agent in terms

336
0:17:48,04 --> 0:17:50,22
of number of transactions, not
in terms of seconds.

337
0:17:50,98 --> 0:17:54,5
The second very, very common situation,
super common.

338
0:17:54,56 --> 0:18:0,3
You created a logical slot and
either haven't started using it

339
0:18:0,3 --> 0:18:2,46
or abandoned or it's lagging.

340
0:18:3,34 --> 0:18:8,56
So if you check the pg_replication
slots, it also has a column

341
0:18:8,56 --> 0:18:13,18
that's called xmin there.

342
0:18:13,18 --> 0:18:13,46
And you also can use age.

343
0:18:13,46 --> 0:18:18,9
So it's another very popular reason
to have something lagging.

344
0:18:18,9 --> 0:18:22,84
And we need to distinguish logical
replication and physical replication.

345
0:18:22,86 --> 0:18:27,1
Because as I remember, if you have
physical replication slot

346
0:18:27,1 --> 0:18:30,06
but haven't started using it, I
think it's not harmful.

347
0:18:30,06 --> 0:18:31,16
I might be mistaken.

348
0:18:31,38 --> 0:18:33,4
Logical replication slots are definitely
harmful.

349
0:18:33,4 --> 0:18:35,88
Unused logical replication slots
are definitely harmful.

350
0:18:36,82 --> 0:18:41,44
Michael: Did this change in a recent
version where I think there

351
0:18:41,44 --> 0:18:46,4
might be some new setting where
by default they expire after

352
0:18:46,4 --> 0:18:47,72
a certain age of not being used.

353
0:18:47,72 --> 0:18:49,78
Nikolay: Yeah, there is such setting,
it's just a protection

354
0:18:49,78 --> 0:18:52,8
mechanism and I think we should
talk about protection measures

355
0:18:52,8 --> 0:18:53,3
separately.

356
0:18:53,6 --> 0:18:54,96
Michael: Yes, okay, cool.

357
0:18:54,96 --> 0:18:57,94
Nikolay: I'm just trying to say
that physical and logical replications,

358
0:18:58,28 --> 0:19:2,76
replication basically streams,
they are slightly different here.

359
0:19:2,78 --> 0:19:8,04
Definitely if you have logical,
if it's unused, it's harmful.

360
0:19:8,04 --> 0:19:9,78
If it's lagging, it's also harmful.

361
0:19:10,28 --> 0:19:11,08
Michael: Makes sense.

362
0:19:12,52 --> 0:19:16,18
Nikolay: It starts harming you
right when it's created.

363
0:19:16,36 --> 0:19:20,22
So you don't want logical replication
slot to be unused or lagging

364
0:19:20,22 --> 0:19:20,9
too much.

365
0:19:20,94 --> 0:19:24,56
Physical slightly different, and
I think, if I'm not mistaking,

366
0:19:25,2 --> 0:19:28,94
if you just create a slot but nothing
connected to it, it won't

367
0:19:28,94 --> 0:19:30,1
hold xmin horizon.

368
0:19:30,8 --> 0:19:34,94
Moreover, if you have physical
replication, but this replication

369
0:19:35,0 --> 0:19:40,34
is not using hot standby feedback,
on replica it's set to off,

370
0:19:40,34 --> 0:19:41,68
which is actually not common.

371
0:19:41,68 --> 0:19:43,82
Usually people prefer to set it
to on.

372
0:19:44,02 --> 0:19:47,54
Hot standby feedback sends, for
physical application, it sends

373
0:19:47,54 --> 0:19:52,36
to the primary the information
which xmin horizon is needed on

374
0:19:52,36 --> 0:19:55,28
replica, right, on standby.

375
0:19:55,76 --> 0:19:59,36
If it's not sending, it's better,
right, because we don't care.

376
0:19:59,54 --> 0:20:5,32
But this will give you, like, physical
replica which will start

377
0:20:5,32 --> 0:20:9,94
lagging every time it has a long-running
transaction on its own.

378
0:20:10,04 --> 0:20:15,16
While hot standby feedback will
protect replication lag, but if

379
0:20:15,16 --> 0:20:17,92
there is a long-running transaction
on replica, it will report

380
0:20:17,92 --> 0:20:20,92
to the primary, and we will have
the same effect as we had the

381
0:20:20,92 --> 0:20:24,82
long-running transaction on our
primary, which explains why this

382
0:20:24,82 --> 0:20:28,68
dilemma makes it really hard to
have long-running transactions

383
0:20:28,68 --> 0:20:32,74
for analytical purposes on regular
physical nodes, right?

384
0:20:32,9 --> 0:20:34,86
And you cannot offload it to replicas.

385
0:20:35,28 --> 0:20:38,96
Only if you can afford replicas
which are lagging for hours,

386
0:20:38,96 --> 0:20:39,64
for example.

387
0:20:39,9 --> 0:20:45,7
Anyway, we have logical and physical
replication which can also

388
0:20:45,7 --> 0:20:46,24
affect us.

389
0:20:46,24 --> 0:20:50,66
And we have fourth component, which
is prepared transactions.

390
0:20:51,82 --> 0:20:54,9
There is a pg_prepared_xacts transactions.

391
0:20:55,68 --> 0:21:1,38
System view you can see also I
don't remember column name maybe

392
0:21:1,38 --> 0:21:3,46
transaction or something, I don't
remember.

393
0:21:3,9 --> 0:21:8,94
But you can also get, if you use
2PC, right, prepared transactions,

394
0:21:10,84 --> 0:21:12,78
in this case you have them, right?

395
0:21:12,78 --> 0:21:17,54
In this case some of them might
be affecting, they may affect

396
0:21:18,14 --> 0:21:18,64
xmin horizon.

397
0:21:19,4 --> 0:21:27,24
So I found 4 sources for our analysis
to define which, so we

398
0:21:27,26 --> 0:21:30,64
dig information from all of them,
take ages And then just take

399
0:21:30,64 --> 0:21:31,72
the greatest of ages.

400
0:21:31,72 --> 0:21:33,7
This is such function greatest,
right?

401
0:21:34,74 --> 0:21:38,5
And this will be our true age of
our xmin horizon.

402
0:21:39,14 --> 0:21:40,94
What is big, what is small?

403
0:21:41,06 --> 0:21:42,26
This is hard to understand.

404
0:21:43,66 --> 0:21:45,54
When I think about it, what is
big?

405
0:21:45,54 --> 0:21:49,64
Is 1,000 too much or 1 million
too much or 1 billion too much?

406
0:21:49,64 --> 0:21:55,36
Billion is definitely too much
because we, we, our, our, yeah,

407
0:21:55,36 --> 0:21:55,92
yeah, yeah.

408
0:21:55,92 --> 0:22:2,22
We have 2.1 is basically our, the
whole capacity for XID wrap

409
0:22:2,22 --> 0:22:3,04
around, right.

410
0:22:3,82 --> 0:22:7,32
But we also, there's also MultiXact
mechanism additionally,

411
0:22:7,38 --> 0:22:8,76
but let's not go there.

412
0:22:9,0 --> 0:22:11,26
Billion is definitely too much.

413
0:22:11,32 --> 0:22:12,94
Is 1,000,000 too much?

414
0:22:13,62 --> 0:22:17,02
In heavily loaded systems, I like
to think about it like thinking

415
0:22:17,44 --> 0:22:21,0
about what tables we have and how
many rows they have.

416
0:22:21,54 --> 0:22:25,66
And how many rows we consider,
how many dead tuples.

417
0:22:26,0 --> 0:22:29,48
So dead row versions, how many
of them we consider too much.

418
0:22:30,06 --> 0:22:34,44
Usually we prefer for OLTP heavy
loaded systems, we prefer

419
0:22:34,64 --> 0:22:36,92
to make autovacuum be very aggressive.

420
0:22:38,64 --> 0:22:41,68
Default settings for autovacuum,
vacuum scale factor, autovacuum

421
0:22:41,68 --> 0:22:45,06
analyze scale factor, they are like 10 and

422
0:22:45,06 --> 0:22:48,22
20% is on 2 common default numbers.

423
0:22:48,26 --> 0:22:49,2
It's too much.

424
0:22:49,34 --> 0:22:54,5
If we have 10 hundred billion rows,
10% is way too much.

425
0:22:54,52 --> 0:23:0,18
So we usually set, okay, like 1%
so 0.01 or even half percent.

426
0:23:0,66 --> 0:23:6,1
Even sometimes even go even lower
and make it delete the tuples

427
0:23:6,1 --> 0:23:11,4
more often now you already have
some hint. Okay, you think I have

428
0:23:11,4 --> 0:23:16,4
like 10 million row table and I
have 1% scale factor. It means

429
0:23:16,4 --> 0:23:20,94
that 100,000 rows is when the
autovacuum should start cleaning

430
0:23:20,94 --> 0:23:21,44
up.

431
0:23:22,36 --> 0:23:26,54
And then, I think, okay, xmin horizon
is lagging that much.

432
0:23:26,64 --> 0:23:27,14
Okay.

433
0:23:27,44 --> 0:23:29,56
Is it lagging 1,000?

434
0:23:30,06 --> 0:23:30,74
It's okay.

435
0:23:31,88 --> 0:23:33,42
We don't care.

436
0:23:33,68 --> 0:23:40,68
Our basically actual horizon when
autovacuum starts, we expect

437
0:23:40,68 --> 0:23:43,52
work from it, 100,000, roughly.

438
0:23:43,52 --> 0:23:46,86
But the problem is that we have
different tables, small, big,

439
0:23:46,86 --> 0:23:47,08
right?

440
0:23:47,08 --> 0:23:50,74
So we need to think about core
tables and like nature of our

441
0:23:50,74 --> 0:23:55,94
database to think how much of xmin horizon,
like xmin h is too

442
0:23:55,94 --> 0:23:57,2
much for us, right?

443
0:23:57,52 --> 0:23:59,94
Michael: I think there's 2, you're
talking almost like a bottoms-up

444
0:24:0,06 --> 0:24:0,72
approach, right?

445
0:24:0,72 --> 0:24:2,9
Working out how much bloat we can
tolerate.

446
0:24:2,9 --> 0:24:5,74
Nikolay: But yeah, how much dead
tuples we can tolerate is not

447
0:24:5,74 --> 0:24:6,3
yet bloat.

448
0:24:6,3 --> 0:24:10,32
As we remember, we allow too many
dead tuples so autovacuum is

449
0:24:10,32 --> 0:24:14,24
converter of the tuples to bloat,
it will come and convert our

450
0:24:14,24 --> 0:24:17,38
millions of dead tuples to bloat,
but it's not bloat yet.

451
0:24:17,84 --> 0:24:22,54
At least in my terminology, I wish
we had official terminology

452
0:24:22,58 --> 0:24:23,26
for bloat.

453
0:24:23,3 --> 0:24:26,12
Like for dead tuples it's clear,
but for bloat, we don't have

454
0:24:26,12 --> 0:24:27,08
it as well.

455
0:24:29,7 --> 0:24:32,56
Michael: The more dead tuples we
leave around, the worse bloat

456
0:24:32,56 --> 0:24:35,86
issues can get, like in terms of
like fragmented indexes for

457
0:24:35,86 --> 0:24:40,6
example, in terms of hot updates,
if we've got dead tuples that

458
0:24:40,6 --> 0:24:40,76
we

459
0:24:40,76 --> 0:24:41,54
Nikolay: can't remove.

460
0:24:43,36 --> 0:24:46,44
Hot updates might benefit from
bloat because they have space.

461
0:24:46,72 --> 0:24:49,04
Michael: No, because it's not space
yet if you can't reclaim

462
0:24:49,04 --> 0:24:49,54
it.

463
0:24:49,54 --> 0:24:51,98
If it's not being processed by
the network.

464
0:24:51,98 --> 0:24:54,44
Nikolay: But then eventually a
vacuum will happen.

465
0:24:54,44 --> 0:24:54,6
Michael: Eventually.

466
0:24:54,6 --> 0:24:54,9199
And it

467
0:24:54,9199 --> 0:24:57,84
Nikolay: will leave a lot of gaps
and these gaps might be reused.

468
0:24:58,74 --> 0:24:59,56
Michael: Maybe, yeah.

469
0:24:59,72 --> 0:25:2,84
Nikolay: There are opposite direction
of effects here.

470
0:25:3,52 --> 0:25:6,66
Michael: But the point I was trying
to get to was often people

471
0:25:6,66 --> 0:25:10,08
are monitoring this in order to
prevent transaction ID wraparound

472
0:25:10,08 --> 0:25:14,94
so the top-down approach could
be how many how much time okay

473
0:25:15,04 --> 0:25:18,3
1,000,000 might not be that bad in
terms of getting towards 2,000,000,000,

474
0:25:18,42 --> 0:25:22,16
but if 2,000,000,000 could accumulate
within an hour or 2, if you've

475
0:25:22,16 --> 0:25:25,96
got like a ridiculously, so actually
you might want an earlier

476
0:25:25,96 --> 0:25:30,32
alert or a lower threshold because
as soon as you find out that

477
0:25:30,32 --> 0:25:33,24
it's going above a healthy number
or above like a not normal

478
0:25:33,24 --> 0:25:36,0
number for your system you've only
got, you're on a stopwatch,

479
0:25:36,0 --> 0:25:38,3
you've only got 2 hours, you've
got to fix this problem.

480
0:25:38,3 --> 0:25:41,4
So it might be that you're thinking
in terms of top down as well.

481
0:25:41,82 --> 0:25:45,98
Nikolay: Yeah, for example if I
see age of this is 100 million

482
0:25:46,0 --> 0:25:50,38
it's already wow, it's a lot, It's
already like emergency.

483
0:25:50,38 --> 0:25:50,58
You

484
0:25:50,58 --> 0:25:51,6
Michael: might only, yeah, exactly.

485
0:25:51,6 --> 0:25:53,6
You might only have a couple of
hours at that point.

486
0:25:53,6 --> 0:25:53,86
Nikolay: Yeah.

487
0:25:53,86 --> 0:25:55,06
Speed is interesting.

488
0:25:55,08 --> 0:26:1,36
I wish we had also good, again,
monitoring systems, like proper

489
0:26:2,14 --> 0:26:7,66
XID consumption, XID growth
rate metric to be covered

490
0:26:7,66 --> 0:26:8,16
properly.

491
0:26:8,48 --> 0:26:13,04
If you think, okay, this is an xmin
horizon, some numbers, okay,

492
0:26:13,04 --> 0:26:16,32
I know it's transaction, but you're
right, we like, it's hard

493
0:26:16,32 --> 0:26:18,42
to map it to real time.

494
0:26:18,76 --> 0:26:23,86
But here I see, okay, we consume
2 billion transactions per week.

495
0:26:24,72 --> 0:26:27,1
It's quite a loaded database, in
fact, right?

496
0:26:27,44 --> 0:26:28,58
Real transaction ideas.

497
0:26:28,58 --> 0:26:32,3
In this case, I see peaks when
there are busy hours, I understand

498
0:26:32,56 --> 0:26:34,26
my seed growth rate.

499
0:26:34,46 --> 0:26:35,79
And this is helpful for me.

500
0:26:35,79 --> 0:26:39,66
I understand, okay, this is, I
can already start playing with

501
0:26:39,66 --> 0:26:40,08
this.

502
0:26:40,08 --> 0:26:42,48
I think we could go even deeper.

503
0:26:42,84 --> 0:26:47,02
I just don't see proper analysis
tools in this area developed.

504
0:26:47,14 --> 0:26:52,08
Like this is like my, like thoughts
and from people I work with

505
0:26:52,08 --> 0:26:56,12
and talk to, and I see huge potential
here.

506
0:26:56,18 --> 0:27:0,04
So it could be much better visualized,
presented for analysis.

507
0:27:0,78 --> 0:27:6,66
Michael: And as you say, separated
out in terms of things that

508
0:27:6,66 --> 0:27:9,52
are risky in terms of locks and
things that are risky in terms

509
0:27:9,52 --> 0:27:9,88
of...

510
0:27:9,88 --> 0:27:11,01
Nikolay: Oh yeah, locks are completely
different.

511
0:27:11,01 --> 0:27:12,34
Michael: I'm just going to keep
saying block.

512
0:27:12,34 --> 0:27:12,74
Yeah.

513
0:27:12,74 --> 0:27:15,12
But that's what I mean, you're
talking about different alerts,

514
0:27:15,78 --> 0:27:17,0
different charts, different...

515
0:27:17,28 --> 0:27:17,7
Yes.

516
0:27:17,7 --> 0:27:22,06
Nikolay: So I would have alert,
and we actually do have alerts

517
0:27:22,3 --> 0:27:27,7
for xmin horizon based on understanding
this particular system

518
0:27:27,7 --> 0:27:32,14
with its growth rate and like table
sizes, right?

519
0:27:32,14 --> 0:27:34,22
And the behavior of vacuum, autovacuum.

520
0:27:34,9 --> 0:27:38,86
And we set it, for example, I think
100 million is a huge number.

521
0:27:38,86 --> 0:27:43,98
We should set it lower and already
start investigating what blocks

522
0:27:44,06 --> 0:27:45,76
our xmin horizon.

523
0:27:45,8 --> 0:27:47,66
And this is the right direction
to understand.

524
0:27:47,84 --> 0:27:51,22
And we also can check logs if you
enable autovacuum.

525
0:27:51,22 --> 0:27:59,24
We usually aim to enable, if not
all, but at least above 1 second

526
0:27:59,24 --> 0:28:1,12
autovacuum processing.

527
0:28:1,84 --> 0:28:4,64
So log_autovacuum_min_duration,
1 second.

528
0:28:4,72 --> 0:28:6,32
Sometimes we set to 0.

529
0:28:7,26 --> 0:28:11,04
There is some risk that you might
see a lot of and have observer

530
0:28:11,04 --> 0:28:14,4
effect, but usually like at least
1 second is beneficial.

531
0:28:14,5 --> 0:28:18,48
So you start Again, I think all
default was 10 minutes.

532
0:28:18,48 --> 0:28:19,74
It's ridiculously huge.

533
0:28:19,74 --> 0:28:21,48
Like you don't see a lot.

534
0:28:21,66 --> 0:28:24,06
And I think it was, it was decreased
recently.

535
0:28:24,08 --> 0:28:24,34
Right?

536
0:28:24,34 --> 0:28:25,06
Oh, cool.

537
0:28:25,38 --> 0:28:25,8
Yeah.

538
0:28:25,8 --> 0:28:31,12
So what I'm talking about here,
if you have a logging for autovacuum

539
0:28:31,4 --> 0:28:35,58
behavior, it reports like xmin horizon,
it reports it, right?

540
0:28:35,86 --> 0:28:40,12
Transaction ID was this, xmin horizon
was that, it's this number

541
0:28:40,12 --> 0:28:40,62
behind.

542
0:28:41,0 --> 0:28:44,34
So you might extract information
from those logs.

543
0:28:44,68 --> 0:28:48,24
And also it reports how many dead
tuples it would remove, but

544
0:28:48,24 --> 0:28:50,64
couldn't because xmin horizon is
blocking.

545
0:28:51,42 --> 0:28:55,9
So autovacuum messages and logs
are extremely helpful for this

546
0:28:55,9 --> 0:28:58,12
type of risk analysis.

547
0:28:58,86 --> 0:29:2,64
Michael: I think it might be just,
I think it's still 10 minutes.

548
0:29:3,52 --> 0:29:4,64
Nikolay: Maybe, maybe.

549
0:29:4,9 --> 0:29:5,4
Michael: Wow.

550
0:29:5,98 --> 0:29:6,44
All right.

551
0:29:6,44 --> 0:29:10,36
Nikolay: So this is about xmin and the vacuum behavior xmin horizon

552
0:29:10,38 --> 0:29:15,66
and this concept again Like I think documentation could be Fair

553
0:29:15,66 --> 0:29:22,74
that these concepts exist Maybe like 1 should propose To define

554
0:29:22,74 --> 0:29:26,02
this, because everyone suffers from bloat.

555
0:29:27,74 --> 0:29:30,62
And I think, in my opinion, everyone needs to understand, working

556
0:29:30,62 --> 0:29:33,96
with Postgres deeply enough, it needs to understand the concept

557
0:29:33,96 --> 0:29:37,62
of xmin horizon to start distinguishing it from long-running

558
0:29:37,76 --> 0:29:41,82
transactions, right and The second problem is like first problem

559
0:29:41,88 --> 0:29:47,22
first problem historically lock acquisition what to say about

560
0:29:47,22 --> 0:29:50,2
it, so it's Yeah,

561
0:29:50,4 --> 0:29:51,66
Michael: I think we should point people.

562
0:29:51,66 --> 0:29:54,1
I think we did a good episode on that specifically.

563
0:29:54,6 --> 0:29:58,1
But yeah, maybe that's a good excuse to switch over to mitigations

564
0:29:58,38 --> 0:29:58,96
though, because...

565
0:29:58,96 --> 0:29:59,46
Nikolay: Mitigation.

566
0:29:59,6799 --> 0:30:4,52
Yeah, 1 point I wanted to mention, there are some exotic problems

567
0:30:4,52 --> 0:30:6,82
also, which might happen from long-running transactions.

568
0:30:6,82 --> 0:30:7,58
Michael: Oh yeah, sure.

569
0:30:7,58 --> 0:30:13,2
Nikolay: I point to my sub transactions Post explaining 4 problems

570
0:30:13,2 --> 0:30:17,34
with sub transactions and 1 of them which happened to GitLab

571
0:30:18,26 --> 0:30:20,36
And we had episode I think about it, right?

572
0:30:20,64 --> 0:30:22,22
So it was the most interesting.

573
0:30:22,64 --> 0:30:25,02
It was happening when you have long-running transactions.

574
0:30:25,08 --> 0:30:29,08
I think when you actually hold this xmin horizon on the primary,

575
0:30:29,48 --> 0:30:32,18
hold snapshot, or maybe locks.

576
0:30:32,18 --> 0:30:33,96
This is a question actually I need to do.

577
0:30:33,96 --> 0:30:38,72
I have some homework to do to return to that material and understand

578
0:30:39,12 --> 0:30:40,84
what of 2 problems happened there.

579
0:30:40,84 --> 0:30:45,06
So long-running transactions on the primary and high TPS replica

580
0:30:45,06 --> 0:30:47,42
started to suffer at some point because of sub-transactions.

581
0:30:49,34 --> 0:30:53,94
Once you close long-running transactions on the primary, suffering

582
0:30:54,02 --> 0:30:54,52
ends.

583
0:30:54,96 --> 0:30:59,58
So this is exotic already, exotic problems where I need to understand

584
0:30:59,58 --> 0:31:2,62
is it lock-related or snapshot-related.

585
0:31:3,58 --> 0:31:6,98
Michael: I also think replicas are an interesting thing in general

586
0:31:6,98 --> 0:31:8,5
in terms of long running transactions.

587
0:31:8,52 --> 0:31:11,04
So long running transactions on the primary and whether they

588
0:31:11,04 --> 0:31:14,24
affect the replica and then also long running transactions on

589
0:31:14,24 --> 0:31:18,4
the replicas and even when they can affect the primary.

590
0:31:18,4 --> 0:31:21,24
Nikolay: ...a replica if hot standby feedback

591
0:31:21,28 --> 0:31:23,1
is turned on, as we discussed.

592
0:31:23,68 --> 0:31:23,77
Exactly.

593
0:31:23,77 --> 0:31:24,84
Can report to...

594
0:31:24,96 --> 0:31:29,28
They will report to primary, can have same effect.

595
0:31:30,14 --> 0:31:30,64
Michael: Yeah.

596
0:31:30,8201 --> 0:31:31,3201
So...

597
0:31:31,5 --> 0:31:34,12
So I think that's not always something people are aware of.

598
0:31:34,12 --> 0:31:38,6
But yeah, in terms of mitigations though, it feels like timeouts

599
0:31:38,64 --> 0:31:42,26
for our friends in a lot of these cases is that, and monitoring,

600
0:31:42,26 --> 0:31:43,78
we've brought up monitoring a lot, right?

601
0:31:43,78 --> 0:31:49,36
Monitoring, alerting, even in extreme
cases, killing certain

602
0:31:49,36 --> 0:31:51,54
problematic long-running transactions.

603
0:31:53,04 --> 0:31:53,8
Nikolay: What did you want

604
0:31:53,8 --> 0:31:55,34
Michael: to talk about in terms
of mitigations?

605
0:31:56,58 --> 0:32:0,78
Nikolay: So, yeah, of course if
you can get rid of all the long

606
0:32:0,78 --> 0:32:3,72
running transactions, both dangers
don't affect your database

607
0:32:3,72 --> 0:32:4,84
health, which is great.

608
0:32:4,84 --> 0:32:11,0
But sometimes we need long-running
transactions, sometimes we

609
0:32:11,0 --> 0:32:15,42
need to accept some lag in replication
slots.

610
0:32:15,48 --> 0:32:19,3
By the way, what caused me to start
distinguishing something

611
0:32:19,3 --> 0:32:20,02
is not right.

612
0:32:20,02 --> 0:32:23,8
We talk about long-running transactions,
but the same problem,

613
0:32:24,28 --> 0:32:25,58
vacuum being blocked.

614
0:32:25,84 --> 0:32:28,94
vacuum not fully blocked, you understand,
so you cannot DELETE

615
0:32:28,94 --> 0:32:29,6
some data.

616
0:32:30,14 --> 0:32:32,64
It's not only about long-running
transactions, it's also about

617
0:32:32,64 --> 0:32:34,68
slots, it's also about prepared
transactions.

618
0:32:34,68 --> 0:32:35,38
What's happening?

619
0:32:35,38 --> 0:32:37,32
Okay, prepared transactions, they
are at least transactions,

620
0:32:37,36 --> 0:32:40,14
but slots, right, or replication,
like anyway.

621
0:32:40,52 --> 0:32:41,62
hot standby feedback.

622
0:32:42,1 --> 0:32:45,02
Okay, hot standby feedback is also
about transactions that just

623
0:32:45,02 --> 0:32:48,16
happen on standby, but just unused
logical slot.

624
0:32:48,28 --> 0:32:50,9
So it's a wrong explanation of
some problem.

625
0:32:50,9 --> 0:32:55,04
Problem is xmin horizon is frozen
it's not progressing right?

626
0:32:55,04 --> 0:32:56,04
Michael: Not progressing yeah.

627
0:32:56,04 --> 0:32:58,78
Nikolay: Maybe it's progressing
but too slow also might happen.

628
0:32:59,38 --> 0:33:1,78
Michael: Yeah so if it's progressing
slowly though it's just

629
0:33:1,78 --> 0:33:3,26
blocked at a new point, right?

630
0:33:3,26 --> 0:33:4,02
Like it's...

631
0:33:4,74 --> 0:33:7,96
Nikolay: Yeah, if it's progressing
too slow, your horizon of

632
0:33:7,96 --> 0:33:11,96
XID, not horizon, but actual XID
is already too far.

633
0:33:11,96 --> 0:33:16,58
So basically what we should look
at is Age of xmin horizon.

634
0:33:17,3 --> 0:33:19,5
Age of the old xmin needed.

635
0:33:20,46 --> 0:33:22,28
Yeah, this is what we, all we need.

636
0:33:22,28 --> 0:33:24,98
And if it's too far in the past,
this is a problem.

637
0:33:25,2 --> 0:33:27,88
And the reasons of this, as I said,
4 reasons.

638
0:33:28,04 --> 0:33:30,2
And mitigation is simple.

639
0:33:30,66 --> 0:33:33,16
Don't allow it to be too far in
the past.

640
0:33:33,16 --> 0:33:33,66
How?

641
0:33:34,02 --> 0:33:41,1
First of all, in OLTP I'm like
very highly recommend to be very

642
0:33:41,1 --> 0:33:43,92
aggressive in time with timeout
settings globally.

643
0:33:44,06 --> 0:33:48,06
I know documentation says that
it's like it's controversial and

644
0:33:48,06 --> 0:33:54,16
so on like I know like If HTTP
server setting can afford 30 seconds

645
0:33:54,16 --> 0:34:0,48
or 60 seconds as maximum for OLTP requests, why do we, database,

646
0:34:0,66 --> 0:34:5,28
cannot afford at least the same
for all regular traffic.

647
0:34:5,42 --> 0:34:8,94
Of course, for CREATE INDEX CONCURRENTLY,
REINDEX CONCURRENTLY,

648
0:34:9,16 --> 0:34:11,06
we need to set it to 0.

649
0:34:11,68 --> 0:34:12,54
What to set?

650
0:34:12,54 --> 0:34:15,92
statement_timeout and 2 other things,
right?

651
0:34:15,92 --> 0:34:19,16
If you think about transaction, it might be like something is

652
0:34:19,16 --> 0:34:22,16
running, I don't transaction, something is running statement,

653
0:34:22,44 --> 0:34:23,2
I don't transaction.

654
0:34:23,2 --> 0:34:25,92
So it's either statement or I don't transaction and there is

655
0:34:25,92 --> 0:34:26,78
a whole transaction.

656
0:34:26,92 --> 0:34:32,78
So basically we have whole and 2 phases, like they switch after

657
0:34:32,78 --> 0:34:33,74
each other, right?

658
0:34:33,74 --> 0:34:38,32
It means we can limit the whole transaction or statement or item

659
0:34:38,32 --> 0:34:39,34
transaction state.

660
0:34:39,34 --> 0:34:42,38
And for all 3, we have setting, right?

661
0:34:42,7 --> 0:34:44,12
And transaction_timeout.

662
0:34:45,06 --> 0:34:48,12
Michael: Only since recently though, only since version 7.

663
0:34:48,12 --> 0:34:50,38
Nikolay: We had an episode about transaction_timeout because

664
0:34:50,38 --> 0:34:51,46
it was my idea.

665
0:34:51,46 --> 0:34:52,6
I was super surprised.

666
0:34:52,6 --> 0:34:56,0
I was walking around thinking, why it doesn't exist?

667
0:34:56,0 --> 0:34:59,34
Like, maybe I don't understand Postgres enough after 15 years

668
0:35:0,56 --> 0:35:0,56
of experience.

669
0:35:0,76 --> 0:35:3,68
Then I told it to Andrei Borodin and he implemented.

670
0:35:4,06 --> 0:35:4,46
Great.

671
0:35:4,46 --> 0:35:6,62
In Postgres 17 it exists.

672
0:35:6,7 --> 0:35:9,62
So in all new Postgres we have transaction_timeout so we can

673
0:35:9,62 --> 0:35:12,72
finally limit the whole thing because before we could limit only

674
0:35:12,72 --> 0:35:15,78
statement and idle_in_transaction_session_timeout.

675
0:35:16,06 --> 0:35:16,12
Yes.

676
0:35:16,12 --> 0:35:20,6
And if you have a series of brief statements and brief waiting

677
0:35:20,6 --> 0:35:25,26
periods between statements, you could have hour or day-long transaction

678
0:35:25,38 --> 0:35:25,88
easily.

679
0:35:26,12 --> 0:35:27,88
Now you can limit the whole thing.

680
0:35:28,26 --> 0:35:29,94
Limiting the whole thing is helpful.

681
0:35:30,58 --> 0:35:34,24
Unfortunately, we cannot limit what we actually would like to

682
0:35:34,24 --> 0:35:35,36
limit is xmin horizon.

683
0:35:36,9 --> 0:35:40,08
Maybe we should have, I just, I just came, this idea came to

684
0:35:40,08 --> 0:35:41,14
my mind just now.

685
0:35:41,14 --> 0:35:42,66
Honestly, I need that info.

686
0:35:42,98 --> 0:35:47,14
We could have like probably like xmin horizon maximum, after which

687
0:35:47,36 --> 0:35:51,64
we start abandoning, like we're canceling transaction, abandoning

688
0:35:51,76 --> 0:35:56,04
prepared transactions, like deleting them, and maybe killing

689
0:35:56,04 --> 0:35:56,82
some application.

690
0:35:57,34 --> 0:35:57,94
Yeah, yeah.

691
0:35:58,26 --> 0:35:59,17
Yeah, why not?

692
0:35:59,17 --> 0:36:2,2
Like 100 million protection, ultimate protection mechanism.

693
0:36:2,32 --> 0:36:5,5
And in this case, you don't need to map what's happening with

694
0:36:5,5 --> 0:36:9,52
your like server in terms of seconds and actual problem, like

695
0:36:9,52 --> 0:36:13,9
how many transactions you don't want to lag, right, for your

696
0:36:13,9 --> 0:36:14,74
garbage collection.

697
0:36:15,06 --> 0:36:17,14
Garbage collection is a vacuum, right?

698
0:36:17,78 --> 0:36:21,14
So maybe we should have those settings, maybe too much settings

699
0:36:21,14 --> 0:36:23,04
at the same time, but it will be flexible.

700
0:36:23,1 --> 0:36:26,4
I would say 100 million is my maximum, or 10 million is my maximum,

701
0:36:26,4 --> 0:36:27,62
regardless of cost.

702
0:36:27,62 --> 0:36:30,38
Michael: Even if Postgres shipped
with conservative defaults

703
0:36:30,38 --> 0:36:31,66
there, it would be fine.

704
0:36:32,1 --> 0:36:35,76
Even if it was like 100,000,000 as
a default, it would be fine

705
0:36:35,76 --> 0:36:38,58
for like cases, wouldn't it?

706
0:36:39,14 --> 0:36:41,6
Nikolay: I'm curious, maybe this
idea was discussed, maybe it's

707
0:36:41,6 --> 0:36:42,58
not a good idea.

708
0:36:42,66 --> 0:36:43,86
From my mind, it just came.

709
0:36:43,86 --> 0:36:44,54
It's interesting.

710
0:36:44,54 --> 0:36:48,56
If you have opinion listening to
us, leave this opinion as a

711
0:36:48,56 --> 0:36:53,76
comment under YouTube or somewhere
like on Twitter or LinkedIn,

712
0:36:53,76 --> 0:36:54,6
I don't know.

713
0:36:54,64 --> 0:36:55,82
Anyway, this is interesting.

714
0:36:56,74 --> 0:36:59,82
But what we can do, we can protect
with transaction_timeout,

715
0:36:59,82 --> 0:37:2,22
statement_timeout, idle_in_transaction_timeout.

716
0:37:2,8 --> 0:37:3,5
What else?

717
0:37:3,6 --> 0:37:8,1
We should, we must actually, monitor
lags of replication slots,

718
0:37:8,4 --> 0:37:11,52
especially logical replication,
but physical as well if we use

719
0:37:11,52 --> 0:37:12,54
hot standby feedback.

720
0:37:12,88 --> 0:37:15,04
And we should not allow lagging
too much.

721
0:37:15,22 --> 0:37:18,66
And finally, we should be very
careful with 2PC and abandoned

722
0:37:18,74 --> 0:37:19,78
prepared transactions.

723
0:37:20,54 --> 0:37:21,22
That's it.

724
0:37:21,22 --> 0:37:22,98
So only these 4 areas.

725
0:37:23,86 --> 0:37:25,92
Michael: Yeah, when you say monitoring,
you mean alerting as

726
0:37:25,92 --> 0:37:26,6
well, right?

727
0:37:26,76 --> 0:37:29,98
Nikolay: Yeah, so if some replication
slot is unused, logical

728
0:37:29,98 --> 0:37:33,84
replication slot, or it's lagging
too much, Again, this is interesting.

729
0:37:33,96 --> 0:37:36,98
It will be also either in bytes
or in seconds.

730
0:37:37,38 --> 0:37:41,6
We should think about seconds,
how much, how we need to map it

731
0:37:41,6 --> 0:37:45,06
to transactions, understanding
our seed growth rate.

732
0:37:45,06 --> 0:37:48,58
So there are different metrics
in play here.

733
0:37:48,58 --> 0:37:52,62
So to understand health properly,
it takes time.

734
0:37:52,76 --> 0:37:58,74
And if you have cattle, not pets,
so you should have simple rules.

735
0:37:59,34 --> 0:38:3,42
If some slot is lagging too much,
By the way, slot lagging too

736
0:38:3,42 --> 0:38:4,84
much, it's in bytes, right?

737
0:38:4,84 --> 0:38:7,5
We cannot limit it in seconds of
lag.

738
0:38:8,86 --> 0:38:12,58
There are bytes, seconds, and then
XID growth, XID like

739
0:38:12,7 --> 0:38:13,68
number of transactions.

740
0:38:13,86 --> 0:38:17,14
So 3 metrics, There is a triangle
here.

741
0:38:17,72 --> 0:38:18,22
Interesting.

742
0:38:18,38 --> 0:38:21,5
Anyway, this is to protect from
affecting vacuum.

743
0:38:21,5 --> 0:38:26,98
You can also reactively just monitor
what the vacuum reports

744
0:38:26,98 --> 0:38:27,66
in logs.

745
0:38:28,36 --> 0:38:31,28
And it will start to complain clearly,
right?

746
0:38:31,28 --> 0:38:35,42
So you can set threshold based
on logs and if you have some good

747
0:38:35,42 --> 0:38:40,22
log analysis system, you could
set up alerts from there.

748
0:38:40,84 --> 0:38:45,46
And as for logs, for me, it's very
completely different topic.

749
0:38:46,1 --> 0:38:49,18
Like yeah, long running transactions
contribute to it, but problems

750
0:38:49,18 --> 0:38:50,44
might happen very fast.

751
0:38:50,82 --> 0:38:54,9
Unlike xmin horizon being blocked,
you might not notice it at

752
0:38:54,9 --> 0:38:55,24
all.

753
0:38:55,24 --> 0:38:59,24
Even you, okay, you ran, you blocked
it for many, many hours.

754
0:38:59,54 --> 0:39:2,92
Maybe it was not, maybe it was
needed at just one time.

755
0:39:3,06 --> 0:39:4,78
It won't affect you so much.

756
0:39:6,58 --> 0:39:11,96
The negative effects, they are
postponed and they grow like it's

757
0:39:11,96 --> 0:39:12,82
not a cliff.

758
0:39:13,26 --> 0:39:15,68
It's like it's slow degradation.

759
0:39:16,42 --> 0:39:20,64
Unlike situation with locks, with
locks it might be very fast.

760
0:39:21,9 --> 0:39:29,0
So if you have, as I said, this
classic example is you have the

761
0:39:29,14 --> 0:39:34,24
weakest heavyweight lock possible,
accessory lock on the table.

762
0:39:34,92 --> 0:39:38,4
You block DDL, DDL blocks other
SELECTs, boom.

763
0:39:38,76 --> 0:39:39,78
What to do here?

764
0:39:39,96 --> 0:39:44,14
We need to use proper DDL deployments.

765
0:39:44,24 --> 0:39:46,72
We need a low lock_timeout and
retries.

766
0:39:46,88 --> 0:39:49,12
This is number one thing I would
recommend.

767
0:39:49,9 --> 0:39:53,74
And number two thing is to craft
schema changes properly to avoid

768
0:39:54,66 --> 0:39:58,5
long running exclusive locks, first
of all, but also accessory

769
0:39:58,5 --> 0:40:0,56
locks as well at some point.

770
0:40:0,6 --> 0:40:5,34
They are not harmful unless you
have conflicting sessions and

771
0:40:5,66 --> 0:40:8,1
to detect conflicts sooner is helpful.

772
0:40:8,5 --> 0:40:12,22
Another measure which is actually
a common problem with Postgres,

773
0:40:12,62 --> 0:40:18,38
if we have a vacuum which blocks
us, vacuum always blocks us

774
0:40:19,02 --> 0:40:20,14
when we do DDL.

775
0:40:20,54 --> 0:40:24,24
We cannot acquire access exclusive
lock on the table to change

776
0:40:24,24 --> 0:40:26,4
its schema if vacuum is happening.

777
0:40:26,74 --> 0:40:30,06
But fortunately in most cases vacuum
is running in normal state

778
0:40:30,06 --> 0:40:32,98
and it will kill itself after I
think a second.

779
0:40:33,56 --> 0:40:36,24
Michael: Do you mean autovacuum
or do you mean anti-wraparound?

780
0:40:36,58 --> 0:40:37,9
Autovacuum.

781
0:40:37,9 --> 0:40:39,88
Nikolay: I'm talking about autovacuum, good correction.

782
0:40:41,24 --> 0:40:44,54
But if it's running in transaction
ID wraparound prevention mode,

783
0:40:44,54 --> 0:40:49,34
which happens often if you have
high XID growth, related topics,

784
0:40:49,34 --> 0:40:49,84
right?

785
0:40:49,94 --> 0:40:54,28
If we'll like insert a lot, not
in batches, but single rows,

786
0:40:54,28 --> 0:40:56,5
for example, we consume XID so
much.

787
0:40:56,94 --> 0:41:0,3
Save points can contribute to XID
consumption and XID growth,

788
0:41:0,72 --> 0:41:2,88
XID growth rates a lot.

789
0:41:3,06 --> 0:41:8,08
In this case, we probably have
a problem like if XID growth

790
0:41:8,08 --> 0:41:11,64
is high, autovacuum transaction
ID wraparound happens more

791
0:41:11,64 --> 0:41:12,14
often.

792
0:41:12,26 --> 0:41:15,68
It might happen, for example, every
two days in a very busy systems.

793
0:41:15,94 --> 0:41:19,94
In this case, it happens every
two days, but we aim to have deployments

794
0:41:20,18 --> 0:41:21,28
five times per day.

795
0:41:21,28 --> 0:41:23,9
We are like very rapidly developed
startup.

796
0:41:24,12 --> 0:41:25,76
In this case, this might clash.

797
0:41:25,76 --> 0:41:29,18
Like you have an autovacuum, transaction
ID wraparound prevention, and

798
0:41:29,18 --> 0:41:30,74
you try to deploy DDL.

799
0:41:30,74 --> 0:41:36,44
And you, even if you have retries,
you must have low lock_timeout

800
0:41:36,44 --> 0:41:37,16
and retries.

801
0:41:37,36 --> 0:41:41,84
You try to deploy it, but it runs
on your terabyte size table

802
0:41:41,84 --> 0:41:42,94
and blocks you.

803
0:41:43,26 --> 0:41:44,44
This is super annoying.

804
0:41:44,44 --> 0:41:48,3
You cannot deploy something you
need to deploy right now.

805
0:41:48,9 --> 0:41:50,3
What to do about it?

806
0:41:51,5 --> 0:41:52,7
Question is open.

807
0:41:53,56 --> 0:41:57,5
Michael: Well, you've talked a
lot about partitioning, so the

808
0:41:57,5 --> 0:41:59,84
tables are smaller and the vacuums
are quicker.

809
0:42:0,12 --> 0:42:3,64
But I actually also think, at the
time I thought it was a good

810
0:42:3,64 --> 0:42:7,3
thing, but I thought the threshold
where an anti-wraparound vacuum

811
0:42:7,3 --> 0:42:11,1
kicks in is relatively low in the
grand scheme of things in terms

812
0:42:11,1 --> 0:42:11,88
of the risk.

813
0:42:12,26 --> 0:42:15,26
I would seriously consider, if
it's happening every 2 days, that

814
0:42:15,26 --> 0:42:16,72
seems like a bad smell to me.

815
0:42:16,72 --> 0:42:18,28
That feels like something that
shouldn't be.

816
0:42:18,28 --> 0:42:21,78
Like, let's make sure normal vacuum
happens enough so that it

817
0:42:21,78 --> 0:42:24,28
doesn't have to run in anti-wraparound
mode.

818
0:42:24,28 --> 0:42:25,22
That would be my.

819
0:42:25,24 --> 0:42:25,74
Nikolay: Yes.

820
0:42:26,68 --> 0:42:27,9
I'm not against this.

821
0:42:27,9 --> 0:42:30,24
Like, it's good that it's happening
often.

822
0:42:30,24 --> 0:42:31,3
And freezing is good.

823
0:42:31,3 --> 0:42:32,12
Like, once.

824
0:42:32,32 --> 0:42:35,68
So partitioning is helpful not
only because tables are smaller,

825
0:42:35,68 --> 0:42:39,14
but more pages are all frozen and
we don't touch them anymore

826
0:42:39,16 --> 0:42:40,94
because they are like our archive.

827
0:42:41,28 --> 0:42:44,78
This is super, it makes things
much more efficient compared to

828
0:42:44,86 --> 0:42:49,02
huge unpartitioned table where
anytime page can receive new tuple

829
0:42:49,74 --> 0:42:52,9
and it's not all frozen anymore
in visibility map, right?

830
0:42:52,9 --> 0:42:55,7
Visibility map has 2 bits, all
frozen and all visible.

831
0:42:55,96 --> 0:42:59,48
And boom, we need to take care
of, autovacuum needs to take care

832
0:42:59,54 --> 0:43:0,78
of this page again.

833
0:43:1,06 --> 0:43:4,94
So the problem is, for me, is not
that it's too often happening.

834
0:43:5,06 --> 0:43:7,24
The problem is like this threshold.

835
0:43:7,48 --> 0:43:11,68
I would sometimes, so it's okay
to have it low and make it very

836
0:43:11,68 --> 0:43:12,14
frequent.

837
0:43:12,14 --> 0:43:15,96
Good partitioning is great, but
sometimes I wish it would be

838
0:43:15,96 --> 0:43:20,52
easier to raise this bar temporarily
while I'm doing my changes

839
0:43:21,04 --> 0:43:22,4
and then raise it back.

840
0:43:22,72 --> 0:43:24,84
But as far as I remember, I might
be mistaken here.

841
0:43:24,84 --> 0:43:28,04
Last time I checked this maybe
a couple of years ago, this topic.

842
0:43:28,62 --> 0:43:29,84
It's not so easy.

843
0:43:31,1 --> 0:43:34,0
You cannot change it globally so
easy.

844
0:43:34,0 --> 0:43:35,58
It requires restart, I think.

845
0:43:35,9 --> 0:43:37,22
Maybe I'm mistaken again.

846
0:43:37,6 --> 0:43:40,34
And the table level, there is also
some caveat.

847
0:43:40,58 --> 0:43:44,16
I don't remember which, but I didn't
see a good way to...

848
0:43:44,64 --> 0:43:47,42
Okay, For each deployment, we want...

849
0:43:47,96 --> 0:43:51,1
And also interesting, you know,
like, you kill that vacuum which

850
0:43:51,1 --> 0:43:55,2
runs in transaction ID wrap around,
and if you don't acquire

851
0:43:55,2 --> 0:43:59,64
log on the table fast enough, it will kick in again.

852
0:44:0,4 --> 0:44:3,48
Especially, like, it might kick in if you have many workers and

853
0:44:3,48 --> 0:44:6,48
nap time is low, so it's an interesting problem as well.

854
0:44:7,66 --> 0:44:14,02
So raising this was not easy, as I remember.

855
0:44:14,44 --> 0:44:17,04
So I came up with 2 ideas.

856
0:44:17,04 --> 0:44:21,04
The first idea is if this is deployment, which is big deployment,

857
0:44:21,04 --> 0:44:24,02
we have like major upgrade or something, because it's a problem

858
0:44:24,02 --> 0:44:24,84
during major upgrades.

859
0:44:24,84 --> 0:44:28,14
We had it during major upgrades when not a vacuum running transaction

860
0:44:28,14 --> 0:44:31,5
ID wrap around was a problem for us or some major changes.

861
0:44:32,1 --> 0:44:38,54
If it's a rare change, we probably should forecast when the vacuum

862
0:44:39,18 --> 0:44:40,24
will start freezing.

863
0:44:41,04 --> 0:44:42,04
And do it before.

864
0:44:42,04 --> 0:44:42,84
Michael: I like that.

865
0:44:43,04 --> 0:44:43,5867
Nikolay: Yes, yes.

866
0:44:43,5867 --> 0:44:44,5667
Do it before.

867
0:44:44,5667 --> 0:44:47,64
Another thing is that probably we should just sometimes kill

868
0:44:47,64 --> 0:44:48,66
it ourselves.

869
0:44:49,66 --> 0:44:53,2
But if you want to kill it, you need to do it properly.

870
0:44:53,2 --> 0:44:57,78
You need to issue attempt to acquire lock on your table to change

871
0:44:58,62 --> 0:45:1,02
and kill it immediately.

872
0:45:2,38 --> 0:45:5,52
Kill it to vacuum and your attempt to acquire a lock is successful

873
0:45:5,58 --> 0:45:9,34
and then good luck to try to prevent transaction ID wrap around

874
0:45:9,34 --> 0:45:10,74
while we're doing our job.

875
0:45:10,94 --> 0:45:11,64
This is like...

876
0:45:11,64 --> 0:45:14,52
Michael: Wait, is that so it doesn't start again?

877
0:45:15,02 --> 0:45:15,54
Nikolay: Yeah, yeah.

878
0:45:15,54 --> 0:45:16,1
So I...

879
0:45:16,1 --> 0:45:16,82
Yeah, okay.

880
0:45:17,14 --> 0:45:17,78
I won't.

881
0:45:17,78 --> 0:45:18,47
That's it.

882
0:45:18,47 --> 0:45:21,72
So the lock is mine and I'm doing my job.

883
0:45:22,06 --> 0:45:24,1
It's like, it feels slightly dirty, right?

884
0:45:24,1 --> 0:45:29,42
But it's, there's no, there is no harm if you also monitor XID

885
0:45:29,6 --> 0:45:31,74
and MultiXID or up around risks.

886
0:45:31,74 --> 0:45:36,74
If you know, you're only at 10% of capacity, 200 million, it's

887
0:45:36,74 --> 0:45:41,82
fine to go to 300, 400 million during like even big deployment.

888
0:45:42,18 --> 0:45:44,08
We will be fine afterwards, right?

889
0:45:46,98 --> 0:45:50,86
But this feels dirty, honestly, to do this trick.

890
0:45:50,86 --> 0:45:53,86
I did this trick manually a few times in my career because I

891
0:45:53,86 --> 0:45:57,62
needed to finish deployment and I knew that we are not in danger

892
0:45:57,62 --> 0:45:57,9
zone.

893
0:45:57,9 --> 0:46:0,56
So yeah, Yeah,

894
0:46:0,86 --> 0:46:2,7
Michael: that makes loads of sense to me.

895
0:46:2,9 --> 0:46:6,42
My preference, I think, still would be avoid it happening as

896
0:46:6,42 --> 0:46:9,84
often as that and actually trying to make sure things got frozen

897
0:46:10,38 --> 0:46:11,58
before it was needed.

898
0:46:11,58 --> 0:46:15,68
So it really is a fail safe rather than a normal like thing that

899
0:46:15,68 --> 0:46:16,96
happens in normal operation.

900
0:46:17,22 --> 0:46:18,58
But yeah, super interesting.

901
0:46:19,34 --> 0:46:23,62
We started the episode talking about what we should call it and

902
0:46:23,62 --> 0:46:26,82
I think maybe we should end it like with a, what do you think

903
0:46:27,18 --> 0:46:29,14
now, having talked about it?

904
0:46:29,14 --> 0:46:29,64
Nikolay: Oh.

905
0:46:32,22 --> 0:46:36,4
Yeah, but we try to keep titles very short, right?

906
0:46:37,12 --> 0:46:41,08
We didn't have xmin horizon episode.

907
0:46:41,74 --> 0:46:45,72
I tend to think this should be called xmin horizon episode, because

908
0:46:45,72 --> 0:46:49,12
this is where we spend time most and it's less about locks. It's

909
0:46:49,12 --> 0:46:53,64
less like and I think long running transaction is a wrong concept

910
0:46:53,9 --> 0:46:54,64
to chase

911
0:46:55,2 --> 0:46:57,44
Michael: Yeah, let's call it xmin horizon because I think that's

912
0:46:57,44 --> 0:46:57,62
gonna

913
0:46:57,62 --> 0:47:3,66
Nikolay: be, but my concern is that people think in terms of long-running

914
0:47:3,68 --> 0:47:4,8
transactions and...

915
0:47:6,26 --> 0:47:7,4
Michael: That's fine.

916
0:47:7,8 --> 0:47:10,64
I think long-running transactions is interesting and I actually

917
0:47:10,64 --> 0:47:14,36
don't mind that there's this edge case of long-running transactions

918
0:47:14,44 --> 0:47:15,48
that aren't harmful.

919
0:47:15,7 --> 0:47:18,74
Nikolay: Let's keep long-running transactions because it's easier

920
0:47:18,74 --> 0:47:20,38
to understand what we're talking about.

921
0:47:20,38 --> 0:47:21,72
This is what people expect.

922
0:47:21,74 --> 0:47:23,94
Let's keep the name long running transactions.

923
0:47:24,76 --> 0:47:27,56
Explain horizon. This concept is still, it's not fully clear in

924
0:47:27,56 --> 0:47:28,02
my head.

925
0:47:28,02 --> 0:47:32,36
Like I it's clear, but I want it to become official in documentation.

926
0:47:32,5 --> 0:47:34,42
Maybe it should be explained in documentation.

927
0:47:35,58 --> 0:47:38,3
Michael: And then that opens us up so we can still do an episode

928
0:47:38,3 --> 0:47:39,44
on that at some point.

929
0:47:39,48 --> 0:47:39,98
Nikolay: Yeah.

930
0:47:40,08 --> 0:47:43,68
Maybe I missed some parts of it, how to properly monitor.

931
0:47:43,82 --> 0:47:47,36
I know some people who develop observability tooling listen to

932
0:47:47,36 --> 0:47:48,34
our episodes.

933
0:47:48,42 --> 0:47:49,12
Michael: Oh cool.

934
0:47:49,54 --> 0:47:52,18
Nikolay: Yeah, I hope it will be improved.

935
0:47:53,32 --> 0:47:53,56
Michael: Nice.

936
0:47:53,56 --> 0:47:54,88
Nikolay: It's like clarity.

937
0:47:55,12 --> 0:47:57,52
And I thank you for listening to this.

938
0:47:57,52 --> 0:47:59,1
I feel I went quite deep.

939
0:47:59,1 --> 0:48:1,4302
We went quite deep in this episode.

940
0:48:1,4302 --> 0:48:2,54
Oh, I personally

941
0:48:2,98 --> 0:48:3,92
Michael: enjoyed it.

942
0:48:3,92 --> 0:48:5,82
Yeah, I hope everyone else did too.

943
0:48:7,12 --> 0:48:7,96
Great, it was

944
0:48:7,96 --> 0:48:9,56
Nikolay: great to chat to you as usual.

945
0:48:9,78 --> 0:48:13,02
Michael: Likewise, Nik, take care and see you next time.