1
00:00:00,140 --> 00:00:02,540
Nikolay: Hello, hello, this is
Postgres.FM.

2
00:00:02,640 --> 00:00:08,460
I'm Nik, Postgres.AI, and as usual,
my co-host is Michael, pgMustard.

3
00:00:09,060 --> 00:00:10,620
Hi, Michael, how are you doing?

4
00:00:10,760 --> 00:00:12,180
Michael: I'm good, how are you?

5
00:00:12,340 --> 00:00:14,160
Nikolay: Great, everything's all
right.

6
00:00:14,160 --> 00:00:20,540
A lot of bugs to fix and incidents
to troubleshoot, to perform

7
00:00:20,540 --> 00:00:23,820
root cause analysis as we say,
RCA.

8
00:00:24,640 --> 00:00:27,080
Michael: It sounds related to our
topic today maybe.

9
00:00:27,380 --> 00:00:29,400
Nikolay: Oh yeah, maybe yes.

10
00:00:30,020 --> 00:00:36,420
So The topic I chose is, I'm still
not 100% sure how to name

11
00:00:36,420 --> 00:00:38,580
it properly, so let's decide together.

12
00:00:39,240 --> 00:00:42,760
But the situation is simple for
me, relatively.

13
00:00:43,660 --> 00:00:47,800
So we help a lot of startups, and
at some point I decided to

14
00:00:47,800 --> 00:00:49,660
focus only on startups.

15
00:00:50,540 --> 00:00:55,660
Being like raised a few startups
and helped many startups.

16
00:00:55,760 --> 00:01:01,060
I know how it feels to choose technology
and grow, grow, grow

17
00:01:01,060 --> 00:01:04,700
until some problems start hitting
you.

18
00:01:05,080 --> 00:01:09,020
This is exactly like usually people
choose RDS or CloudSQL or

19
00:01:09,020 --> 00:01:10,300
Supabase, anything.

20
00:01:11,200 --> 00:01:18,220
And they don't need to hire DBAs,
DBREs, and they grow quite

21
00:01:18,220 --> 00:01:23,260
well until a few terabytes of data
or 10,000 TPS, that kind

22
00:01:23,260 --> 00:01:24,560
of scale.

23
00:01:25,240 --> 00:01:29,740
And then problems pop up here and
there, and sometimes they come

24
00:01:30,280 --> 00:01:33,360
in batches, you know, like not
just 1 problem, but several.

25
00:01:34,400 --> 00:01:37,580
And here usually, for us it's good,
they come to us.

26
00:01:37,580 --> 00:01:40,960
I mean, Postgres.AI, we have still
consulting wing, quite strong

27
00:01:40,960 --> 00:01:41,640
and growing.

28
00:01:42,500 --> 00:01:46,180
And we helped more than 20 startups
over the last year, which

29
00:01:46,260 --> 00:01:47,660
I'm very proud of.

30
00:01:49,440 --> 00:01:53,940
And I collected a lot of case studies,
so to speak.

31
00:01:54,440 --> 00:01:59,760
And I decided to have some classification
of problems that feel

32
00:01:59,760 --> 00:02:03,500
not good at very high level, for
example, CTO level or even CEO,

33
00:02:03,820 --> 00:02:08,560
when you think they might start
thinking, is Postgres the right

34
00:02:08,560 --> 00:02:12,640
choice or it's giving us too much
headache?

35
00:02:13,480 --> 00:02:19,060
And it's not about like, oh, out
of disk space suddenly, or major

36
00:02:19,060 --> 00:02:21,820
upgrades requiring some maintenance
window.

37
00:02:21,820 --> 00:02:23,980
Although this also can cause some
headache.

38
00:02:24,780 --> 00:02:28,380
But it's more about problems like
where you don't know what to

39
00:02:28,380 --> 00:02:33,540
do, or you see it requires a lot
of effort to solve it properly.

40
00:02:34,020 --> 00:02:36,740
Michael: Yeah, I've had a sneak
peek of your list, so I like

41
00:02:36,740 --> 00:02:38,200
how you've described it.

42
00:02:38,200 --> 00:02:42,020
I also like the thought process
of whether it hits the CTO or

43
00:02:42,020 --> 00:02:46,180
the CEO, and I was thinking, let's
say you have a non-technical

44
00:02:46,480 --> 00:02:51,060
CEO, if they start hearing the
word Postgres too often it's probably

45
00:02:51,060 --> 00:02:52,040
a bad sign.

46
00:02:53,200 --> 00:02:56,580
Ideally you might mention it once
every few years when you do

47
00:02:56,580 --> 00:03:01,220
a major version upgrade, but then
nothing bad happens and they

48
00:03:01,220 --> 00:03:03,060
don't hear it again for a few years.

49
00:03:03,080 --> 00:03:05,700
But if they're hearing, you know,
if it's getting to the CEO

50
00:03:05,740 --> 00:03:10,440
that Postgres is causing problems
over and over again, the natural

51
00:03:10,440 --> 00:03:13,640
question is going to be, is there
an alternative?

52
00:03:13,740 --> 00:03:14,940
What could we do instead?

53
00:03:14,940 --> 00:03:16,580
Or, you know, is this a big problem?

54
00:03:16,880 --> 00:03:21,480
So I guess it's these kind of dangers,
not just to the startup,

55
00:03:21,480 --> 00:03:25,340
but also to Postgres' continued
use at that startup.

56
00:03:26,040 --> 00:03:29,160
Nikolay: Yeah, I like the word
dangerous here because when you

57
00:03:29,160 --> 00:03:34,020
deal with some of these problems,
it might feel dangerous to

58
00:03:34,040 --> 00:03:35,820
have Postgres for them.

59
00:03:37,360 --> 00:03:37,940
It's bad.

60
00:03:37,940 --> 00:03:41,860
Like I would like if things were
better.

61
00:03:42,980 --> 00:03:49,540
So I have a list of 10 items And
we can discuss and the list

62
00:03:49,540 --> 00:03:53,360
is unordered and I'm going to post
it to my social networks so

63
00:03:53,360 --> 00:03:54,600
folks can discuss.

64
00:03:54,820 --> 00:04:00,120
And I'm like sincerely think that
this list is useful.

65
00:04:00,940 --> 00:04:05,820
If you're a startup, it's great
to just to use this checklist

66
00:04:05,940 --> 00:04:13,260
to see how your cluster is doing
or clusters are doing and are

67
00:04:13,260 --> 00:04:13,920
you ready.

68
00:04:16,760 --> 00:04:21,440
So Postgres growth readiness checklist.

69
00:04:22,120 --> 00:04:25,140
And interesting that I didn't include
vertical and horizontal

70
00:04:25,760 --> 00:04:26,760
scaling there.

71
00:04:27,380 --> 00:04:29,760
I did it indirectly, we will touch
it.

72
00:04:29,960 --> 00:04:34,060
But obviously, like this is the
most discussed topic, the biggest

73
00:04:34,060 --> 00:04:38,940
danger, how Postgres scales, like
cluster, single primary, and

74
00:04:38,940 --> 00:04:41,920
multiple standbys, how far we can
go.

75
00:04:41,920 --> 00:04:45,740
We know we can go very far, very,
very far on a single cluster.

76
00:04:46,300 --> 00:04:49,760
At some point, microservices, or
maybe sharding, it's great.

77
00:04:50,200 --> 00:04:56,820
But we had a great episode with
Lev Kokotov, a PgDog and it

78
00:04:56,820 --> 00:05:00,720
resonates 1 of the items I have
today it resonates with what

79
00:05:00,720 --> 00:05:03,220
he said during our episode.

80
00:05:03,820 --> 00:05:08,360
So anyway, let's exclude vertical
and horizontal scaling and

81
00:05:08,360 --> 00:05:12,840
talk about stuff which kind of
sounds boring.

82
00:05:13,940 --> 00:05:15,900
My first item is heavy lock contention.

83
00:05:16,560 --> 00:05:17,700
This is very popular.

84
00:05:17,720 --> 00:05:23,160
Maybe 50% of companies that come
to us have this issue.

85
00:05:23,480 --> 00:05:23,980
Somehow.

86
00:05:25,320 --> 00:05:29,480
So at some point I decided to start
saying everyone, if you have

87
00:05:29,540 --> 00:05:37,360
queue-like workloads, or additionally,
and or, If you don't know

88
00:05:37,360 --> 00:05:42,800
how dangerous it is to change schema
in Postgres, just adding

89
00:05:42,800 --> 00:05:44,560
column can be a problem, right?

90
00:05:45,040 --> 00:05:46,940
We discussed it, I think, many
times.

91
00:05:48,160 --> 00:05:51,660
You are not ready to grow, and
at some point, sooner or later,

92
00:05:51,660 --> 00:05:52,740
it will hit you.

93
00:05:53,000 --> 00:05:55,620
And it will hit you as a spike
of active sessions.

94
00:05:56,120 --> 00:06:01,320
And we know some managed Postgres
platforms provoke you to have

95
00:06:01,320 --> 00:06:04,200
huge number of huge max connections.

96
00:06:04,200 --> 00:06:04,920
Michael: Max connections,

97
00:06:04,920 --> 00:06:05,420
Nikolay: yeah.

98
00:06:05,460 --> 00:06:07,760
RDS like 5, 000, 2, 500.

99
00:06:08,860 --> 00:06:10,180
Why do they do this?

100
00:06:10,760 --> 00:06:11,860
Easier for them.

101
00:06:12,180 --> 00:06:17,440
But it's dangerous because it creates kind of performance cliff

102
00:06:17,440 --> 00:06:17,940
additionally.

103
00:06:19,020 --> 00:06:21,760
Michael: Yeah, it's another version of these cliffs isn't it?

104
00:06:21,760 --> 00:06:21,900
Nikolay: We

105
00:06:21,900 --> 00:06:23,320
Michael: had another good episode recently.

106
00:06:23,320 --> 00:06:26,300
Nikolay: Yeah, I plan to research this a little bit, probably

107
00:06:26,460 --> 00:06:29,500
we will publish something in this area to prove that it's not

108
00:06:29,500 --> 00:06:29,860
good.

109
00:06:29,860 --> 00:06:34,920
It's still not good even if you have Postgres 14 plus which has

110
00:06:34,920 --> 00:06:39,300
great optimizations for a large number of idle connections it's

111
00:06:39,300 --> 00:06:40,280
still not good.

112
00:06:42,200 --> 00:06:45,860
Michael: And there have been some improvements like I know a

113
00:06:45,860 --> 00:06:49,800
very good engineer who took Postgres down by adding a column

114
00:06:50,460 --> 00:06:52,060
with a default, I think it was.

115
00:06:52,060 --> 00:06:55,680
But it was many years, there's some improvements in recent years

116
00:06:55,680 --> 00:06:58,860
of some DDL changes that are less dangerous than they were.

117
00:06:58,860 --> 00:07:00,580
Nikolay: Yeah, there are several levels.

118
00:07:01,520 --> 00:07:02,320
Michael: Yes, yeah, of course.

119
00:07:02,320 --> 00:07:05,160
And if they get stuck in if they don't have a lock_timeout

120
00:07:05,160 --> 00:07:07,780
for example in fact yeah we're probably going to be pointing

121
00:07:07,780 --> 00:07:10,640
to episodes on every single 1 of these bullet points but we have

122
00:07:10,640 --> 00:07:14,220
we had 1 on 0 downtime migrations I think it's probably the best

123
00:07:14,220 --> 00:07:17,080
for that and we had a separate 1 on queues actually, didn't we?

124
00:07:17,080 --> 00:07:17,840
Nikolay: So yeah.

125
00:07:17,840 --> 00:07:18,280
Yeah.

126
00:07:18,280 --> 00:07:23,240
So definitely there are solutions here and you just need to proactively

127
00:07:23,940 --> 00:07:24,440
deploy.

128
00:07:24,520 --> 00:07:30,060
It's interesting that I see some companies grow quite far not

129
00:07:30,060 --> 00:07:33,220
noticing this problem, for example, with DDL.

130
00:07:33,680 --> 00:07:37,680
It becomes, it's like going to casino, like you can win, you

131
00:07:37,680 --> 00:07:38,420
can win.

132
00:07:38,480 --> 00:07:40,520
Sometimes, boom, you lose.

133
00:07:40,840 --> 00:07:46,360
Because if you deploy some DDL and you get blocked, you can block

134
00:07:46,360 --> 00:07:48,140
others and it can be a disaster.

135
00:07:48,340 --> 00:07:49,900
We discussed it several times.

136
00:07:50,240 --> 00:07:54,340
And if you had a hundred deployments successfully, it doesn't

137
00:07:54,340 --> 00:07:59,360
mean you will keep winning, right?

138
00:07:59,600 --> 00:08:01,620
So it's better to have.

139
00:08:02,160 --> 00:08:05,020
And it concerns me, I have a feeling we should implement this

140
00:08:05,020 --> 00:08:05,660
in Postgres.

141
00:08:06,220 --> 00:08:08,800
Like alter table concurrently or something like this.

142
00:08:08,800 --> 00:08:12,460
It should itself perform these retries with low lock_timeout.

143
00:08:13,320 --> 00:08:14,680
Michael: Yeah, it's tricky, isn't it?

144
00:08:14,680 --> 00:08:18,660
But I agree, but then people still need to know that it existed

145
00:08:18,740 --> 00:08:23,940
to actually use it because I think the main issue here is people

146
00:08:23,940 --> 00:08:26,400
not realizing that it can be a problem.

147
00:08:27,180 --> 00:08:29,280
And the fact it probably hits users.

148
00:08:29,680 --> 00:08:31,420
Let's say you've got a statement_timeout.

149
00:08:32,440 --> 00:08:35,460
When are you actually going to notice that users have been waiting

150
00:08:35,460 --> 00:08:36,040
for it?

151
00:08:36,040 --> 00:08:38,600
Are you going to notice that spike
on your on your monitoring?

152
00:08:38,600 --> 00:08:42,040
I'm not sure like it depends how
many users actually got stuck

153
00:08:42,040 --> 00:08:44,320
waiting behind it and had slow
queries.

154
00:08:44,600 --> 00:08:47,160
So and it's going to be hard to
reproduce that you might not

155
00:08:47,160 --> 00:08:48,500
know why it was that.

156
00:08:48,600 --> 00:08:49,100
So

157
00:08:49,540 --> 00:08:53,400
Nikolay: log_lock_waits is off
so you don't see who blocked

158
00:08:53,400 --> 00:08:53,900
you.

159
00:08:54,060 --> 00:08:58,420
And you might be auto vacuum running
in this aggressive mode

160
00:08:58,420 --> 00:09:05,960
or it can be another session long
running transaction which holds

161
00:09:06,260 --> 00:09:09,060
access share lock to a table and
you cannot alter it.

162
00:09:09,240 --> 00:09:10,940
And boom, you block others.

163
00:09:11,040 --> 00:09:15,560
So this is like a reaction chain.

164
00:09:16,920 --> 00:09:18,040
And yeah, it's not good.

165
00:09:18,040 --> 00:09:22,120
And queue-like workloads same, like
at some smaller scale, you don't

166
00:09:22,120 --> 00:09:23,320
see problems at all.

167
00:09:23,680 --> 00:09:26,700
Then you occasionally experience
them.

168
00:09:26,800 --> 00:09:29,920
But if you grow very fast, you
will start hitting these problems

169
00:09:29,920 --> 00:09:30,640
very badly.

170
00:09:31,080 --> 00:09:37,540
And they look like spikes of heavy
lock contention or just heavy

171
00:09:37,540 --> 00:09:41,540
lock and lock in Postgres terminology
is the same so just lock contention

172
00:09:42,280 --> 00:09:49,020
and yeah it doesn't look good so
and suggestion is so simple

173
00:09:49,440 --> 00:09:54,200
like I It's funny that we talk
a lot and people that come to

174
00:09:54,200 --> 00:09:58,480
us actually they mention they watch
podcast and I say like, okay,

175
00:09:58,480 --> 00:09:59,640
do you like workload?

176
00:09:59,920 --> 00:10:03,820
Just take care of indexes, take
care of bloat, like maybe partitioning,

177
00:10:04,340 --> 00:10:06,600
but most importantly, skip locked.

178
00:10:07,580 --> 00:10:08,360
That's it.

179
00:10:08,360 --> 00:10:13,060
This is a solution, but we spent
hours to discuss details.

180
00:10:14,280 --> 00:10:17,120
Because when you go to reality,
it's not easy to learn this,

181
00:10:17,120 --> 00:10:19,840
like there are objections sometimes,
but this is what we do,

182
00:10:19,840 --> 00:10:24,940
like we work with those objections
and help to implement, right?

183
00:10:25,280 --> 00:10:30,520
So yeah, but we, yeah, for everything
we had episode.

184
00:10:30,940 --> 00:10:32,220
There are episodes for everything.

185
00:10:32,220 --> 00:10:34,180
So this was number 1, heavy load
contention.

186
00:10:34,400 --> 00:10:36,360
And I chose the most popular reasons.

187
00:10:36,360 --> 00:10:37,780
Of course there are other reasons.

188
00:10:39,520 --> 00:10:47,740
But in my view, DDL and queue-like
workloads, not to number biggest,

189
00:10:47,740 --> 00:10:48,640
the biggest ones.

190
00:10:48,800 --> 00:10:52,120
Okay, next it's boring, super boring.

191
00:10:53,080 --> 00:10:55,060
Bloat control and index management.

192
00:10:55,120 --> 00:10:58,300
We had episodes about it, maybe
several actually.

193
00:10:59,160 --> 00:11:06,600
But Since again, managed Postgres
platforms don't give you tools.

194
00:11:07,540 --> 00:11:13,480
For example, RDS, they did great
job in auto-vacuum tuning, but

195
00:11:13,480 --> 00:11:14,760
only half of it.

196
00:11:15,060 --> 00:11:20,040
They made it very aggressive in
terms of how much resources,

197
00:11:20,660 --> 00:11:23,340
like throttling, they gave a lot
of resources.

198
00:11:23,800 --> 00:11:26,360
But they don't adjust scale factors.

199
00:11:27,180 --> 00:11:32,160
So it visits, autovacuum visits
your tables Not often, not often

200
00:11:32,160 --> 00:11:33,160
enough for LTP.

201
00:11:34,280 --> 00:11:37,500
So bloat can be accumulated and
so on, and they don't give you

202
00:11:37,660 --> 00:11:42,340
resources to understand the reasons
of bloat.

203
00:11:44,540 --> 00:11:49,080
I'm thinking about it and I think
it's tricky and it's also a

204
00:11:49,080 --> 00:11:53,040
problem of Postgres documentation
because it lacks clarity how

205
00:11:53,040 --> 00:11:58,400
we troubleshoot reasons of the
bloat because we always say long-running

206
00:11:58,480 --> 00:12:01,820
transaction But not every transaction
is harmful.

207
00:12:02,020 --> 00:12:07,520
For example, in default Transaction
Isolation level Read Committed,

208
00:12:08,240 --> 00:12:12,160
transaction is not that harmful
if it consists of many small

209
00:12:12,260 --> 00:12:12,760
queries.

210
00:12:14,120 --> 00:12:17,540
If it's a single query, it holds
a snapshot, it's harmful.

211
00:12:17,660 --> 00:12:22,120
So I guess with observability we
should shift from long-running

212
00:12:22,120 --> 00:12:26,280
transactional language to xmin horizon
language fully and discuss

213
00:12:26,280 --> 00:12:26,780
that.

214
00:12:26,920 --> 00:12:32,540
Anyway, like I can easily imagine
and I observe how people think,

215
00:12:32,540 --> 00:12:35,460
oh, like MongoDB doesn't have this
stuff.

216
00:12:35,740 --> 00:12:38,620
Or some other Database system,
they don't have the problem with

217
00:12:38,620 --> 00:12:39,120
bloat.

218
00:12:39,960 --> 00:12:41,760
Or indexes, indexes, oh.

219
00:12:41,760 --> 00:12:45,060
Actually with indexes, my true
belief is that degradation of

220
00:12:45,060 --> 00:12:48,040
index health is happening in other
systems as well.

221
00:12:48,040 --> 00:12:49,240
We also discussed it.

222
00:12:49,240 --> 00:12:51,100
So they need to be rebuilt.

223
00:12:51,500 --> 00:12:54,160
Michael: I was listening to a SQL
Server podcast just for fun

224
00:12:54,160 --> 00:12:58,140
the other day and they had the
exact same problem.

225
00:12:58,140 --> 00:13:02,080
But in the episode where we talked
about index maintenance, I

226
00:13:02,080 --> 00:13:06,260
think it came up that even if you're
really on top of autovacuum,

227
00:13:06,460 --> 00:13:10,680
even if you have it configured
really nicely, there can still

228
00:13:10,680 --> 00:13:12,660
be occasions where you get some
bloat.

229
00:13:12,660 --> 00:13:16,640
If you have like a spike or if
you have a large deletion or you

230
00:13:16,640 --> 00:13:20,440
have like a there's a few cases
where you can end up with sparsely

231
00:13:20,500 --> 00:13:23,860
populated indexes that can't self-heal
like if for example you've

232
00:13:23,860 --> 00:13:30,580
got like an or like a even UUIDv7
index and then you have a section

233
00:13:30,580 --> 00:13:33,660
that maybe deletes some old data
and it's not partitioned, then

234
00:13:33,660 --> 00:13:35,200
you've got a gap in your index.

235
00:13:35,200 --> 00:13:38,980
So there's a bunch of reasons why
they can get bloated anyway,

236
00:13:39,520 --> 00:13:41,140
even if you're on top of autovacuum.

237
00:13:41,380 --> 00:13:44,820
So I think this is 1 of those ones
that, yes, autovacuum fixes

238
00:13:44,820 --> 00:13:48,280
most of the problems, but you probably
still want to have some

239
00:13:48,280 --> 00:13:50,040
plan for index maintenance anyway.

240
00:13:50,640 --> 00:13:55,240
Nikolay: Yeah, so there are certain
things that are not automated

241
00:13:55,380 --> 00:14:00,940
by Postgres itself or by Kubernetes
operators or by, Well, some

242
00:14:00,940 --> 00:14:04,060
of them automated some things,
but not everyone, not everything.

243
00:14:04,440 --> 00:14:07,620
Or managed service providers, even
upgrades.

244
00:14:08,540 --> 00:14:10,580
Also like lack of automation there.

245
00:14:10,940 --> 00:14:15,700
We can mention this lack of automation
of analyze, but fortunately

246
00:14:16,380 --> 00:14:19,920
future Postgres versions will be
definitely fine because dump,

247
00:14:19,920 --> 00:14:23,560
restore of statistics is implemented
finally and goes to Postgres

248
00:14:23,560 --> 00:14:25,900
18, which is super great news.

249
00:14:26,120 --> 00:14:29,540
Anyway, lack of automation might
feel like, oh, this is a constant

250
00:14:30,060 --> 00:14:32,680
headache, but it's solvable.

251
00:14:32,720 --> 00:14:33,460
It's solvable.

252
00:14:33,740 --> 00:14:36,240
Fortunately, it requires some effort,
but it's solvable.

253
00:14:36,340 --> 00:14:36,660
Okay.

254
00:14:36,660 --> 00:14:41,400
Next thing is, next thing is, let's
talk about lightweight lock

255
00:14:41,400 --> 00:14:41,900
contention.

256
00:14:42,940 --> 00:14:45,780
So we talked about heavy lock contention
or just lock contention.

257
00:14:45,920 --> 00:14:51,040
Lightweight lock contention is
also, this feels like, like pain

258
00:14:51,040 --> 00:14:52,320
and of various kinds.

259
00:14:54,000 --> 00:14:57,260
So lightweight locks can be called
latches, it's in memory.

260
00:14:57,260 --> 00:15:01,920
So when some operations with buffer
pool happen, for example,

261
00:15:02,480 --> 00:15:06,740
there are the lightweight locks,
Postgres needs to establish or

262
00:15:06,740 --> 00:15:10,160
working with WAL or various data
structures.

263
00:15:10,440 --> 00:15:12,220
Also can mention LockManager.

264
00:15:12,600 --> 00:15:18,200
So things like LWLock:LockManager
or buffer mapping

265
00:15:18,200 --> 00:15:21,960
or sub-trans SLRU, multi-exec SLRU.

266
00:15:22,960 --> 00:15:30,020
When you hear this, for me, like
these terms, imagine like this

267
00:15:30,020 --> 00:15:36,280
font, like bloody, you know, like
red blood, blood drops, drops

268
00:15:36,280 --> 00:15:43,600
of blood because, because III know
so like many projects like

269
00:15:43,980 --> 00:15:46,520
suffered big pain, like big incidents.

270
00:15:47,640 --> 00:15:52,180
So for me, these, these terms are
like bloody terms, you know,

271
00:15:53,060 --> 00:15:56,700
because, because yeah, because
it's, it's, it was a lot of pain

272
00:15:57,100 --> 00:15:57,600
sometimes.

273
00:15:57,840 --> 00:16:01,380
For example, you know I'm a big
fan of sub-transactions, right?

274
00:16:02,080 --> 00:16:05,540
Just my natural advice is just
to eliminate them all.

275
00:16:05,540 --> 00:16:08,000
Well, over time, I'm softer.

276
00:16:08,100 --> 00:16:11,760
I say, okay, you just need to understand
them and use very carefully.

277
00:16:12,340 --> 00:16:15,600
But LockManager, couple of years,
remember Jeremy Schneider

278
00:16:15,600 --> 00:16:16,240
posted like-

279
00:16:16,240 --> 00:16:17,500
Michael: Yeah, great post.

280
00:16:17,520 --> 00:16:20,640
Nikolay: Horror stories, and we
discussed it as well.

281
00:16:21,040 --> 00:16:26,620
So this kind of contention might
hit you and it feels like performance

282
00:16:26,680 --> 00:16:29,160
cliff usually so all good all good
boom

283
00:16:30,140 --> 00:16:33,880
Michael: right it's it what is
or was is it changing in 18 but

284
00:16:33,880 --> 00:16:36,360
it what it was a hard-coded limit,
right?

285
00:16:37,120 --> 00:16:39,700
Nikolay: 2016, you mean for fast
path?

286
00:16:39,780 --> 00:16:45,920
Also SLRU sizes are now configurable,
I think in 2017 already.

287
00:16:46,360 --> 00:16:50,600
Well, nice, good, but not always
enough.

288
00:16:51,300 --> 00:16:54,400
Because okay, you can buy some
time, but still there is a cliff

289
00:16:54,400 --> 00:16:57,320
and if you're not far from it,
again, boom.

290
00:16:57,940 --> 00:17:02,080
Or this, I recently saw it, like,
remember we discussed 4 million

291
00:17:02,080 --> 00:17:03,260
transactions per second.

292
00:17:04,860 --> 00:17:09,220
And we discussed that we first
we found pg_stat_kcache was an

293
00:17:09,220 --> 00:17:11,020
issue, it was fixed, and then pg_stat_statements.

294
00:17:11,520 --> 00:17:14,440
Okay, pg_stat_statements, if the
transactions are super fast,

295
00:17:16,420 --> 00:17:17,760
it's bringing an observer effect.

296
00:17:17,760 --> 00:17:22,040
And we see it in newer Postgres
versions as LWLock:pg_stat_statements

297
00:17:24,020 --> 00:17:28,680
because finally code is covered
by proper props, right?

298
00:17:28,680 --> 00:17:34,240
Not props, like is wrapped and
it's visible in the wait event

299
00:17:34,240 --> 00:17:36,980
analysis observing just that activity.

300
00:17:37,440 --> 00:17:43,760
So I saw it recently at 1 customer,
I know like some layer of

301
00:17:43,780 --> 00:17:46,880
lightweight lock pages and statements,
So we need to discuss what's

302
00:17:46,880 --> 00:17:47,380
happening.

303
00:17:47,860 --> 00:17:54,000
It happens only when you have a
lot of very fast queries, but

304
00:17:54,160 --> 00:17:55,900
it can be a problem as well.

305
00:17:56,400 --> 00:18:00,520
But yeah, and performance cliffs,
it requires some practice to

306
00:18:00,520 --> 00:18:02,160
understand where they are.

307
00:18:02,160 --> 00:18:06,540
It's hard because you need to understand
what kind of, like how

308
00:18:06,540 --> 00:18:10,140
to measure usage, how to understand
like situation risks.

309
00:18:11,200 --> 00:18:11,756
This requires some practice.

310
00:18:11,756 --> 00:18:13,620
Michael: I think this is 1 of the
hardest ones.

311
00:18:13,620 --> 00:18:17,060
I think this is 1 of the hardest
ones to see coming.

312
00:18:17,400 --> 00:18:22,740
Nikolay: After all our stories
with LW LockManager, every time

313
00:18:22,740 --> 00:18:29,720
I see some query exceeds 1000 QPS,
queries per second, I'm already

314
00:18:29,720 --> 00:18:34,280
thinking, okay, this patient is
developing some chronic disease

315
00:18:34,280 --> 00:18:34,940
you know.

316
00:18:35,740 --> 00:18:38,520
Michael: Okay that's another that's
1 I haven't heard we've done

317
00:18:38,520 --> 00:18:41,680
several rules of thumb before but
that's that's another good

318
00:18:41,680 --> 00:18:45,600
1 so a thousand queries per second
for a single query check.

319
00:18:45,600 --> 00:18:49,020
Nikolay: It's very relative also
how many vCPUs we have.

320
00:18:49,020 --> 00:18:51,920
If we have less, it can hit faster.

321
00:18:52,340 --> 00:18:55,600
Although we couldn't reproduce
exactly the same nature as we

322
00:18:55,600 --> 00:19:04,340
see on huge machines like 196 cores,
we couldn't reproduce that

323
00:19:04,340 --> 00:19:07,080
nature on 8 core machines at all.

324
00:19:08,300 --> 00:19:11,780
So yeah, it's for big boys only,
you know.

325
00:19:13,680 --> 00:19:17,980
This is, yeah, this is, like, or
maybe for adults.

326
00:19:18,700 --> 00:19:21,140
So young projects don't experience
these problems.

327
00:19:22,360 --> 00:19:24,620
Michael: That's a good point actually
the startups that have

328
00:19:24,620 --> 00:19:27,620
hit this that you've written about
and things have tended to

329
00:19:27,620 --> 00:19:31,580
be further along in their journeys
huge huge but yeah but still

330
00:19:31,640 --> 00:19:35,000
growing quickly and it's even a
bigger problem at that point

331
00:19:35,000 --> 00:19:36,900
but yeah good point should we move
on?

332
00:19:37,160 --> 00:19:42,740
Nikolay: Yeah so the next 1 is
our usual suspect right it's a

333
00:19:42,740 --> 00:19:48,060
wraparound of 8 byte transaction
ID and multi-exact ID.

334
00:19:49,020 --> 00:19:51,640
So many words already said about
this.

335
00:19:52,920 --> 00:19:56,880
It just bothers me that monitoring
doesn't cover, for example,

336
00:19:56,880 --> 00:19:59,180
usually doesn't cover multi-exact
IDs.

337
00:20:00,180 --> 00:20:02,800
And people still don't have alerts
and so on.

338
00:20:03,100 --> 00:20:05,040
So it's sad.

339
00:20:07,260 --> 00:20:10,220
Yeah, it's easy to create these
days.

340
00:20:10,960 --> 00:20:14,260
Michael: I get the impression though,
I mean, there were a few

341
00:20:14,260 --> 00:20:17,940
high profile incidents that got
blocked about.

342
00:20:17,940 --> 00:20:20,120
I think, yes, yeah, exactly.

343
00:20:20,600 --> 00:20:25,240
And I feel like I haven't seen
1 in a long while and I know there

344
00:20:25,240 --> 00:20:29,820
are a lot of projects that are
having to you know, I think Agen

345
00:20:29,820 --> 00:20:32,540
have spoken about If they weren't
on top of this it would be

346
00:20:32,540 --> 00:20:34,960
it would only be a matter of hours
before they'd hit wrap around,

347
00:20:34,960 --> 00:20:36,860
you know it's that kind of volume.

348
00:20:37,200 --> 00:20:39,840
So they're really really having
to monitor and stay on top of

349
00:20:39,840 --> 00:20:40,880
it all the time.

350
00:20:41,520 --> 00:20:44,840
But I haven't heard of anybody
actually hitting this for quite

351
00:20:44,840 --> 00:20:45,460
a while.

352
00:20:45,740 --> 00:20:48,760
Do you think, I wondered if for
example there were some changes

353
00:20:48,760 --> 00:20:53,620
to I think it was autovacuum being
like I think it kicks in

354
00:20:53,620 --> 00:20:58,660
to do an anti-wraparound vacuum
differently or it might be lighter

355
00:20:58,660 --> 00:21:01,960
a lighter type of vacuum that it
runs now I think I remember

356
00:21:01,960 --> 00:21:04,460
Peter Geoghegan posting about it, something
like that.

357
00:21:04,660 --> 00:21:05,740
Do you remember a change in

358
00:21:05,740 --> 00:21:05,900
Nikolay: that area?

359
00:21:05,900 --> 00:21:07,400
I don't remember, honestly.

360
00:21:08,140 --> 00:21:10,020
I just know this is still a problem.

361
00:21:11,940 --> 00:21:16,300
Again, at CTO level, it feels like,
how come Postgres still has

362
00:21:16,300 --> 00:21:20,380
4 byte transaction IDs and what
kind of risks I need to take

363
00:21:20,380 --> 00:21:21,080
into account.

364
00:21:21,980 --> 00:21:26,120
But you are right, managed Postgres
providers do quite a good

365
00:21:26,120 --> 00:21:26,880
job here.

366
00:21:27,700 --> 00:21:28,920
They take care.

367
00:21:30,180 --> 00:21:35,000
I had a guest at Postgres TV, Hannu
Krosing, who talked about

368
00:21:35,000 --> 00:21:40,280
how to escape from it in a non-traditional
and in his opinion,

369
00:21:40,280 --> 00:21:44,100
and actually my opinion as well,
in a better way, without single

370
00:21:44,100 --> 00:21:44,880
user mode.

371
00:21:45,900 --> 00:21:50,520
And since he is a part of CloudSQL
team, so it also shows how

372
00:21:50,540 --> 00:21:56,340
much of effort managed Postgres
providers do in this area, realizing

373
00:21:56,380 --> 00:21:57,680
this is a huge risk.

374
00:21:58,280 --> 00:22:01,500
Michael: Yeah, and it's not even,
even if it's a small risk,

375
00:22:02,160 --> 00:22:05,400
the impact when it happens is not
small.

376
00:22:05,820 --> 00:22:07,440
So it's 1 of those ones where-
Absolutely

377
00:22:07,440 --> 00:22:08,080
Nikolay: good correction.

378
00:22:08,300 --> 00:22:10,380
It's low risk, high impact, exactly.

379
00:22:10,840 --> 00:22:11,500
Michael: Yes, yes.

380
00:22:12,780 --> 00:22:18,700
So I think the cases that were
blogged about were hours and hours

381
00:22:18,700 --> 00:22:22,400
possibly even getting, was it even
a day or 2 of downtime for

382
00:22:22,400 --> 00:22:23,200
those organizations?

383
00:22:23,420 --> 00:22:27,600
And that was, that is then, I mean,
you're talking about dangers,

384
00:22:27,600 --> 00:22:27,900
right?

385
00:22:27,900 --> 00:22:29,640
Nikolay: That's- Global downtime,
whole Database is down.

386
00:22:29,640 --> 00:22:30,140
Michael: Exactly.

387
00:22:31,300 --> 00:22:34,040
People, you're gonna lose some
customers over that, right?

388
00:22:34,540 --> 00:22:38,200
Nikolay: Yeah, unlike the next
item, a 4-byte integer primary

389
00:22:38,200 --> 00:22:40,400
key is still a thing, you know.

390
00:22:40,640 --> 00:22:45,960
I was surprised to have recently
this case, which was overlooked

391
00:22:46,240 --> 00:22:47,340
by our tooling.

392
00:22:48,720 --> 00:22:49,340
Michael: Oh, really?

393
00:22:49,340 --> 00:22:51,420
Nikolay: I couldn't like, how come?

394
00:22:51,420 --> 00:22:55,140
Yeah, because it was a non-traditional
way to have this.

395
00:22:55,140 --> 00:22:55,820
Michael: Go on.

396
00:22:56,540 --> 00:22:59,540
Nikolay: Well, it was first of
all, the sequence, which was used

397
00:22:59,540 --> 00:23:00,640
by multiple tables.

398
00:23:03,720 --> 00:23:06,420
Yeah, 1 for all of them.

399
00:23:06,420 --> 00:23:11,920
And somehow it was defined, so
our report in postgres-checkup didn't

400
00:23:11,920 --> 00:23:17,620
see it so when it came I was like
how come this like this is

401
00:23:17,980 --> 00:23:21,100
old friend or old enemy, not friend,
enemy.

402
00:23:21,180 --> 00:23:22,000
Michael: Old enemy.

403
00:23:22,660 --> 00:23:25,540
Nikolay: I haven't seen you for
like so many years.

404
00:23:26,520 --> 00:23:30,200
And you look differently, you know,
because multiple tables,

405
00:23:31,260 --> 00:23:33,360
but still, like, it's not fun.

406
00:23:33,580 --> 00:23:37,780
And this causes partial downtime
because some part of workload

407
00:23:37,800 --> 00:23:38,680
cannot work anymore.

408
00:23:38,680 --> 00:23:39,660
You cannot INSERT.

409
00:23:40,440 --> 00:23:40,940
Michael: Yeah.

410
00:23:41,120 --> 00:23:41,580
Nikolay: Yeah.

411
00:23:41,580 --> 00:23:48,340
So, by the way, I also learned
that if you just do in place ALTER

412
00:23:48,340 --> 00:23:53,260
TABLE for huge table, it not so
dumb as I thought.

413
00:23:53,500 --> 00:23:55,360
I checked source code, I was impressed.

414
00:23:55,400 --> 00:23:59,940
And this code is from 9 point something,
maybe even before.

415
00:24:00,100 --> 00:24:05,500
So if you ALTER TABLE, ALTER COLUMN
to change from int4 to int8,

416
00:24:05,660 --> 00:24:08,400
it actually performs a job like
similar to VACUUM FULL.

417
00:24:09,640 --> 00:24:10,780
Recreating indexes.

418
00:24:11,040 --> 00:24:12,280
And you don't have bloat.

419
00:24:12,280 --> 00:24:16,100
I expected like 50% bloat, you
know.

420
00:24:16,360 --> 00:24:17,060
Michael: Oh, why?

421
00:24:17,680 --> 00:24:20,740
Nikolay: Because I thought it will
be, it will rewrite the whole

422
00:24:20,740 --> 00:24:21,100
table.

423
00:24:21,100 --> 00:24:21,980
I was mistaken.

424
00:24:23,360 --> 00:24:24,300
It's quite smart.

425
00:24:24,400 --> 00:24:27,440
Yeah, it's of course it's a blocking
operation, it causes downtime

426
00:24:27,440 --> 00:24:31,780
to perform it, but you end up having
quite clean state of table

427
00:24:31,780 --> 00:24:32,460
and indexes.

428
00:24:33,320 --> 00:24:35,340
Not quite, clean state, it's fresh.

429
00:24:35,900 --> 00:24:38,940
Michael: Yeah, so that is a table
rewrite, no?

430
00:24:39,640 --> 00:24:42,100
Nikolay: Yes, well, table, yes,
well, you are right.

431
00:24:42,100 --> 00:24:45,160
I was thinking about table rewrite
as a very dumb thing, like

432
00:24:45,160 --> 00:24:48,400
create more tuples and DELETE other
tuples.

433
00:24:48,480 --> 00:24:49,540
Michael: Got it, got it, got it.

434
00:24:49,540 --> 00:24:52,200
Nikolay: But there is a mechanism
of table rewrite in the code.

435
00:24:52,200 --> 00:24:55,540
Now I saw it finally, I'm still
learning, you know, sorry.

436
00:24:56,040 --> 00:24:59,140
Michael: You might end up with
some padding issues if you had

437
00:24:59,140 --> 00:25:01,240
it optimized well before, but yeah.

438
00:25:01,240 --> 00:25:05,220
Nikolay: Yeah, it also feels like
Postgres could implement some

439
00:25:05,220 --> 00:25:09,720
reshape like eventually because
there are building blocks in

440
00:25:09,720 --> 00:25:14,080
the code already, I see them like
to first like offline style

441
00:25:14,340 --> 00:25:18,960
to change COLUMN order and then
if you want it and then fully

442
00:25:18,960 --> 00:25:23,940
online style if pg_squeeze goes
to core to core right yeah yeah

443
00:25:23,940 --> 00:25:28,160
it would be great yeah I'm just
like connecting paths here and

444
00:25:28,200 --> 00:25:32,900
can be very powerful in like 3
to 5 years maybe But it's a lot

445
00:25:32,900 --> 00:25:33,780
of work additionally.

446
00:25:33,920 --> 00:25:38,580
So all those who are involved in
moving huge building blocks,

447
00:25:39,240 --> 00:25:40,380
I have huge respect.

448
00:25:41,740 --> 00:25:42,040
So okay.

449
00:25:42,040 --> 00:25:42,180
And I

450
00:25:42,180 --> 00:25:45,020
Michael: think this is, if you
know what you're doing, this one's

451
00:25:45,020 --> 00:25:46,520
easier to recover from.

452
00:25:47,120 --> 00:25:51,140
I assume like with the sequence,
for example, you can handle

453
00:25:51,140 --> 00:25:55,520
it multiple ways, but you can set
the sequence to like negative

454
00:25:55,560 --> 00:25:58,060
2 billion and normally you've got
a good start.

455
00:25:58,380 --> 00:26:01,277
Nikolay: Everyone thinks they're
smart and and this is first

456
00:26:01,277 --> 00:26:04,080
thing I this is the first thing
I hear always when we discuss

457
00:26:04,080 --> 00:26:04,580
this.

458
00:26:04,840 --> 00:26:07,360
This was in the past, like let's
use negative values.

459
00:26:07,360 --> 00:26:10,140
Of course, if you can use negative
values, do it.

460
00:26:10,160 --> 00:26:14,240
Because we know Postgres integers
are signed integers.

461
00:26:14,440 --> 00:26:19,700
So we use only half of capacity
of, 4 byte capacity, half of

462
00:26:19,700 --> 00:26:22,320
it is 2.1 billion, roughly.

463
00:26:23,760 --> 00:26:28,500
So you have 2.1 billion more, but
not always it's possible to

464
00:26:28,500 --> 00:26:28,780
use.

465
00:26:28,780 --> 00:26:34,060
But this is old, old, old story
still making some people nervous

466
00:26:34,160 --> 00:26:38,060
and I think it's good to check
in advance.

467
00:26:39,380 --> 00:26:42,040
Michael: So much better, so much
better to have alerts when you're

468
00:26:42,040 --> 00:26:42,080
getting...

469
00:26:42,080 --> 00:26:46,300
Nikolay: I sell several companies,
big ones, from this, just

470
00:26:46,300 --> 00:26:47,220
raising this.

471
00:26:47,540 --> 00:26:53,480
And I know in some companies, it
was like 1 year or a few years

472
00:26:53,480 --> 00:26:54,900
work to fix it.

473
00:26:56,420 --> 00:26:57,900
Michael: So what was the problem
before?

474
00:26:57,900 --> 00:27:00,640
Was it looking at columns instead
of looking at sequences?

475
00:27:01,260 --> 00:27:01,910
Or what was the-

476
00:27:01,910 --> 00:27:03,780
Nikolay: No, sequences are always
8 bytes.

477
00:27:04,300 --> 00:27:07,180
It was always so, like, if they
are 8 bytes.

478
00:27:07,760 --> 00:27:10,060
Problem with report, I don't remember
honestly.

479
00:27:10,240 --> 00:27:11,520
There was some problem with report.

480
00:27:11,520 --> 00:27:17,000
It was not standard way, not just
create table and you have default

481
00:27:17,040 --> 00:27:19,740
with sequence and you see it.

482
00:27:19,740 --> 00:27:22,980
Something else, some function,
I don't remember exactly.

483
00:27:23,100 --> 00:27:23,600
Okay.

484
00:27:23,600 --> 00:27:25,880
But usually our report catches
such things.

485
00:27:26,820 --> 00:27:32,000
Or you can just check yourself
if you have primary keys for byte,

486
00:27:32,060 --> 00:27:37,640
it's time to move to 8 bytes or
to UUID version 7, right?

487
00:27:37,900 --> 00:27:38,400
Maybe.

488
00:27:39,140 --> 00:27:41,400
Okay, that's it about this.

489
00:27:41,520 --> 00:27:45,560
Then let's heat up the situation,
replication limits.

490
00:27:45,940 --> 00:27:50,260
So in the beginning I mentioned
vertical and horizontal scaling

491
00:27:51,460 --> 00:27:56,520
and usually people say there's
not enough CPU or something and

492
00:27:56,520 --> 00:27:57,160
disk I.O.

493
00:27:57,160 --> 00:28:03,860
And we need to scale and you can
scale read only workloads having

494
00:28:03,860 --> 00:28:05,540
more read only standbys.

495
00:28:06,180 --> 00:28:09,120
But it's hard to scale writes,
and this is true.

496
00:28:10,260 --> 00:28:15,860
But also true that at some point,
and Lev mentioned it, In our

497
00:28:15,860 --> 00:28:19,540
PgDog episode, he mentioned that
in Instacart they had it, right?

498
00:28:19,540 --> 00:28:24,320
So like at 200 megabytes per second
WAL generation, it's already

499
00:28:24,320 --> 00:28:25,020
over 100.

500
00:28:25,020 --> 00:28:28,440
I don't remember exactly what he
mentioned, but somewhere in

501
00:28:28,440 --> 00:28:29,020
that area.

502
00:28:29,020 --> 00:28:29,980
Michael: It was a lot.

503
00:28:31,100 --> 00:28:34,120
Nikolay: Well, just 10 WALs per
second, for example, it gives

504
00:28:34,120 --> 00:28:34,780
you 160.

505
00:28:35,420 --> 00:28:42,380
Well, RDS sets less because they
have 64 megabyte WALs.

506
00:28:42,540 --> 00:28:43,940
They raised it 4 times.

507
00:28:44,380 --> 00:28:49,400
Anyway, 100 or 200, 300 megabytes
per second, you start hitting

508
00:28:49,400 --> 00:28:53,240
the problems with single-threaded
processes.

509
00:28:53,520 --> 00:28:58,860
Actually, it wonders me why we
don't monitor, like any Postgres

510
00:28:58,860 --> 00:29:01,740
monitoring should have, with low
level access.

511
00:29:01,960 --> 00:29:04,340
RDS doesn't have it, but low-level
access.

512
00:29:04,940 --> 00:29:11,920
We should see how much CPU, we
should see CPU usage for every

513
00:29:12,040 --> 00:29:15,320
important single-threaded Postgres
process.

514
00:29:16,420 --> 00:29:16,920
Right?

515
00:29:17,040 --> 00:29:22,400
WAL sender, WAL receiver, Logical
Replication Worker, Checkpointer

516
00:29:22,700 --> 00:29:23,600
maybe as well.

517
00:29:23,600 --> 00:29:24,020
That would

518
00:29:24,020 --> 00:29:24,640
Michael: be helpful.

519
00:29:24,800 --> 00:29:28,480
Nikolay: Of course, because it's
dangerous to grow at scale when

520
00:29:28,480 --> 00:29:32,380
you hit 100% of a single vCPU.

521
00:29:33,420 --> 00:29:36,380
And then you need to either vertically
or horizontally scale

522
00:29:36,380 --> 00:29:40,220
or start saving on WAL generation.

523
00:29:41,840 --> 00:29:44,720
Fortunately, in pg_stat_statements,
we have WAL metrics, 3

524
00:29:44,720 --> 00:29:46,680
columns since Postgres 13.

525
00:29:47,280 --> 00:29:51,160
But unfortunately, this is query
level.

526
00:29:51,660 --> 00:29:56,040
What we need, we need also table
level to understand which tables

527
00:29:56,040 --> 00:29:58,120
are responsible for a lot of WAL.

528
00:29:58,700 --> 00:30:01,040
And pg_stat_activity lacks it.

529
00:30:01,320 --> 00:30:03,040
I think it's a good idea to implement.

530
00:30:03,080 --> 00:30:05,260
If someone wants hacking, this
is a great idea.

531
00:30:05,280 --> 00:30:11,100
Add 3 more columns, WAL-based,
WAL-related metrics to pg_stat_all_tables,

532
00:30:11,480 --> 00:30:14,940
pg_stat_sys_tables, and
user tables.

533
00:30:16,500 --> 00:30:17,440
It would be great.

534
00:30:17,520 --> 00:30:22,360
Also, maybe pg_stat_databases, like
a global view of things, how

535
00:30:22,360 --> 00:30:23,100
much WAL.

536
00:30:23,560 --> 00:30:26,400
Michael: Yeah, I've got a vague
memory there were a couple of

537
00:30:26,400 --> 00:30:29,660
WAL-related new views, new system
views

538
00:30:29,720 --> 00:30:30,220
Nikolay: introduced.

539
00:30:30,800 --> 00:30:33,620
Oh yes, but it's about, Yeah, it's
about...

540
00:30:35,080 --> 00:30:36,300
Are you talking about the pg_stat_io?

541
00:30:37,540 --> 00:30:38,040
No.

542
00:30:38,940 --> 00:30:43,360
WAL-related in Postgres 13, it
went to EXPLAIN and it went to

543
00:30:43,520 --> 00:30:44,660
pg_stat_statements.

544
00:30:45,060 --> 00:30:46,020
This is what happened.

545
00:30:46,860 --> 00:30:49,660
Anyway, this is not an easy problem
to solve.

546
00:30:50,860 --> 00:30:52,940
It's easy to check if you have
access.

547
00:30:53,600 --> 00:30:56,100
Unfortunately, if you're on managed
Postgres, you don't have

548
00:30:56,100 --> 00:30:56,600
access.

549
00:30:57,080 --> 00:30:58,480
They need to check it.

550
00:30:58,700 --> 00:31:01,040
What's happening, especially on
standbys.

551
00:31:02,120 --> 00:31:06,700
And also, it makes sense to tune
compression properly because

552
00:31:06,700 --> 00:31:08,500
compression can eat your CPU.

553
00:31:08,600 --> 00:31:10,320
Remember we discussed WAL compression.

554
00:31:10,320 --> 00:31:12,180
I always said, let's turn it on.

555
00:31:12,440 --> 00:31:17,680
Now I think let's turn it on unless
you have this problem.

556
00:31:18,420 --> 00:31:19,900
In this case, WAL sender.

557
00:31:21,540 --> 00:31:27,880
Yeah, you need to check how much
of that is WAL compression.

558
00:31:27,900 --> 00:31:33,620
And also we have new compression
algorithms implemented in fresh

559
00:31:33,620 --> 00:31:34,540
Postgres versions.

560
00:31:35,740 --> 00:31:37,940
So, yeah.

561
00:31:38,320 --> 00:31:39,900
And that is CD.

562
00:31:42,440 --> 00:31:44,960
So, so, so we can choose better
1.

563
00:31:44,960 --> 00:31:49,020
For example, as it 4 should be
providing similar as remember

564
00:31:49,020 --> 00:31:50,340
some I saw some benchmarks.

565
00:31:50,600 --> 00:31:52,320
I didn't do it myself yet.

566
00:31:53,300 --> 00:31:59,120
So it should be similar to to be
gels at default compression

567
00:31:59,280 --> 00:32:02,640
in terms of compression ratio,
but it takes much less CPU, like

568
00:32:02,640 --> 00:32:04,280
2 to 3 times less CPU.

569
00:32:04,280 --> 00:32:05,820
So it's worth choosing that.

570
00:32:06,760 --> 00:32:10,120
Michael: I looked at it briefly
just for our own use for planned

571
00:32:10,120 --> 00:32:10,620
storage.

572
00:32:10,640 --> 00:32:13,920
And it was, we even got better
performance as well.

573
00:32:13,920 --> 00:32:16,740
So It won on both for us.

574
00:32:16,840 --> 00:32:20,900
Nikolay: Well, you probably saw
what I saw, Small Datum blog,

575
00:32:20,900 --> 00:32:21,400
right?

576
00:32:21,680 --> 00:32:22,180
No?

577
00:32:22,280 --> 00:32:23,300
OK, maybe not.

578
00:32:23,300 --> 00:32:26,640
Let's move on, because there are
more things to discuss.

579
00:32:27,700 --> 00:32:27,900
Michael: Nice.

580
00:32:27,900 --> 00:32:28,740
Nikolay: Design limits.

581
00:32:28,860 --> 00:32:34,540
So some people already think what
they will do when their table

582
00:32:34,540 --> 00:32:36,320
reaches 32 terabytes.

583
00:32:38,300 --> 00:32:42,440
Michael: Yeah, I guess this and
the last 1 both feel like adult

584
00:32:42,440 --> 00:32:43,820
problems again, right?

585
00:32:43,820 --> 00:32:48,100
Like, There aren't too many small
startups hitting the Instacart

586
00:32:48,280 --> 00:32:52,540
level of WAL generation or 32
terabytes of data.

587
00:32:52,540 --> 00:32:55,820
Nikolay: So yeah, it's a really
big clusters.

588
00:32:59,760 --> 00:33:04,060
But we can start thinking about
them earlier and be better prepared.

589
00:33:05,020 --> 00:33:07,860
Maybe sometimes not spending too
much time, because you know

590
00:33:07,860 --> 00:33:11,320
like if you spend too much time
thinking about how you will bring

591
00:33:11,320 --> 00:33:14,560
statistics to new cluster after
measure of red, but it's already

592
00:33:14,560 --> 00:33:19,600
implemented in 18, so just a couple
of years more and everyone

593
00:33:19,600 --> 00:33:20,940
forgets about this problem.

594
00:33:21,260 --> 00:33:23,420
Michael: It's the kind of thing
that's useful to have like a,

595
00:33:23,420 --> 00:33:26,140
if you've got like a to-do list
you're going to be using in a

596
00:33:26,140 --> 00:33:28,620
couple of years time or a calendar
that you know you're going

597
00:33:28,620 --> 00:33:32,840
to get an alert for, Just put a
reminder in for like a year or

598
00:33:32,840 --> 00:33:34,240
two's time, just check.

599
00:33:35,200 --> 00:33:38,080
Nikolay: We also mentioned a few
times during last episodes,

600
00:33:38,300 --> 00:33:40,880
latest episodes, this bothers me
a lot.

601
00:33:41,240 --> 00:33:44,540
First I learned CloudSQL has this
limit, then I learned RDS

602
00:33:44,540 --> 00:33:45,640
also has this limit.

603
00:33:45,700 --> 00:33:49,940
64 terabytes per whole database
what's happening here it's already

604
00:33:50,200 --> 00:33:51,260
not huge database

605
00:33:52,200 --> 00:33:54,960
Michael: but again in a couple
of years who knows they might

606
00:33:54,960 --> 00:33:56,040
have increased that

607
00:33:56,040 --> 00:33:59,220
Nikolay: yeah well I think it's
solvable of course I guess it's

608
00:33:59,220 --> 00:34:05,900
this is limit of single EBS volume
or disk on Google Cloud, PD,

609
00:34:05,900 --> 00:34:08,740
SSD, or how they call it.

610
00:34:09,020 --> 00:34:13,540
So yeah, it's solvable, I think,
right?

611
00:34:13,600 --> 00:34:14,980
With some tricks.

612
00:34:15,920 --> 00:34:20,260
But these days, it doesn't feel
huge.

613
00:34:20,280 --> 00:34:22,940
64, it feels big database, right?

614
00:34:23,300 --> 00:34:24,140
But when we say

615
00:34:24,140 --> 00:34:25,380
Michael: to most of us,

616
00:34:26,460 --> 00:34:29,680
Nikolay: but we have stories, 100
plus databases, all of them

617
00:34:29,680 --> 00:34:31,200
are are self managed.

618
00:34:31,880 --> 00:34:34,360
I think everyone has

619
00:34:34,540 --> 00:34:35,880
Michael: all of them were sharded.

620
00:34:36,020 --> 00:34:37,360
All of those were sharded.

621
00:34:37,740 --> 00:34:42,280
Nikolay: Yeah, 100%, yeah, we had
a great episode, 100 terabytes,

622
00:34:43,320 --> 00:34:47,140
where it was Adyen, who else?

623
00:34:47,680 --> 00:34:49,240
Michael: We had Notion and we had
Figma.

624
00:34:49,540 --> 00:34:50,440
Nikolay: Figma, right.

625
00:34:50,500 --> 00:34:53,900
And Notion and Figma, they are
on RDS, but it's a shard, a single

626
00:34:53,900 --> 00:34:56,180
cluster, it's impossible on RDS,
right?

627
00:34:56,880 --> 00:35:00,960
And I think Adyen has 100 plus terabytes.

628
00:35:01,460 --> 00:35:03,920
No, they have 100 plus terabytes,
but it's self-managed.

629
00:35:04,620 --> 00:35:04,740
Yes.

630
00:35:04,740 --> 00:35:07,120
Because on RDS it's impossible,
not supported.

631
00:35:07,800 --> 00:35:09,640
Michael: Well, and they shard it
also.

632
00:35:10,440 --> 00:35:11,340
Nikolay: Yeah, yeah, yeah.

633
00:35:11,480 --> 00:35:16,020
But yeah, large companies like
that, they're always like some

634
00:35:16,080 --> 00:35:17,860
parts are sharded, some are not.

635
00:35:18,280 --> 00:35:22,440
So anyway, when you have 10 to
20 terabytes, it's time to think

636
00:35:22,440 --> 00:35:26,980
if you are on RDS or CloudSQL,
is it like how you will grow

637
00:35:26,980 --> 00:35:27,480
5X?

638
00:35:27,980 --> 00:35:31,500
Because if you're 20 terabytes
to grow 5X, 100, it's already

639
00:35:31,920 --> 00:35:33,960
not possible with a single cluster,
right?

640
00:35:33,960 --> 00:35:37,660
Another reason to think about splitting
somehow.

641
00:35:38,800 --> 00:35:39,600
So, okay.

642
00:35:40,680 --> 00:35:43,280
Then a few more items, like, so
data loss.

643
00:35:43,520 --> 00:35:45,300
Data loss is a big deal.

644
00:35:45,900 --> 00:35:53,220
If you poorly designed backups
or HA solutions, Yeah, it can

645
00:35:53,220 --> 00:35:53,720
be.

646
00:35:54,020 --> 00:35:58,380
Let's join this with poor HA choice
leading to failures like

647
00:35:58,380 --> 00:35:59,140
split brains.

648
00:35:59,280 --> 00:36:00,900
So data loss, split brain.

649
00:36:01,060 --> 00:36:03,240
Actually I thought we had a discussion.

650
00:36:03,740 --> 00:36:06,980
There is ongoing discussion in
the project called CloudNativePG

651
00:36:06,980 --> 00:36:11,700
where I raised the topic of
split-brain and demonstrated

652
00:36:11,880 --> 00:36:14,280
how to do it a couple of weeks
ago.

653
00:36:15,720 --> 00:36:21,200
And good news as I see, they decided
to implement something to

654
00:36:21,200 --> 00:36:27,280
move in a direction similar to
Patroni because when network partition

655
00:36:27,280 --> 00:36:33,000
happens and the primary is basically
alone, it's bad because

656
00:36:33,000 --> 00:36:34,200
it remains active.

657
00:36:35,160 --> 00:36:39,260
And as I demonstrated, some parts
of application might still

658
00:36:39,520 --> 00:36:40,580
talk into it.

659
00:36:40,640 --> 00:36:42,020
This is classical split-brain.

660
00:36:42,380 --> 00:36:48,080
And I saw, based on discussion,
I saw it triggered, I never thought

661
00:36:48,080 --> 00:36:48,960
deeply, actually.

662
00:36:49,300 --> 00:36:53,880
But is split-brain just a variant
of data loss?

663
00:36:56,780 --> 00:36:59,860
Michael: Well, I guess you technically
might not have lost the

664
00:36:59,860 --> 00:37:00,300
data.

665
00:37:00,300 --> 00:37:02,420
It's still there, you just have
2 versions.

666
00:37:03,240 --> 00:37:03,480
Which 1

667
00:37:03,480 --> 00:37:03,957
Nikolay: is correct?

668
00:37:03,957 --> 00:37:04,190
Michael: It's worse

669
00:37:04,190 --> 00:37:04,540
Nikolay: than data loss.

670
00:37:04,540 --> 00:37:05,980
It's worse than data loss.

671
00:37:05,980 --> 00:37:06,720
It's worse.

672
00:37:06,780 --> 00:37:09,840
Because now you have 2 versions
of reality.

673
00:37:11,040 --> 00:37:11,980
And it's bad.

674
00:37:12,180 --> 00:37:15,660
With data loss, you can apologize
and ask to like, Bring your

675
00:37:15,660 --> 00:37:17,140
data back again, please.

676
00:37:17,900 --> 00:37:21,100
And in some cases we allowed some
data loss.

677
00:37:21,100 --> 00:37:24,940
Of course, data loss is really
a sad thing to have.

678
00:37:25,440 --> 00:37:27,680
But sometimes we have it like...

679
00:37:28,380 --> 00:37:30,460
Officially, some data loss might
happen.

680
00:37:30,620 --> 00:37:31,960
The risk is very low.

681
00:37:32,160 --> 00:37:36,520
And maximum this number of bytes,
for example, but it might happen.

682
00:37:36,600 --> 00:37:38,100
With split-brain it's worse.

683
00:37:38,300 --> 00:37:44,280
You need a lot, spend a lot of
effort to merge realities into

684
00:37:44,280 --> 00:37:44,780
1.

685
00:37:46,060 --> 00:37:49,400
Michael: Most cases of data loss
I've seen have tended to be

686
00:37:49,400 --> 00:37:55,860
at least 2 things gone wrong like
user error or some some way

687
00:37:55,860 --> 00:38:00,460
that the nodes gone down but quite
often it's user error accidental

688
00:38:00,460 --> 00:38:05,160
delete without a where clause or
dropping a table in 1 environment.

689
00:38:05,280 --> 00:38:07,540
Nikolay: Well, this is like higher
level data loss.

690
00:38:07,840 --> 00:38:11,320
Michael: Well, but that, but that
can cause the low level data

691
00:38:11,320 --> 00:38:15,600
loss if you then also don't have
tested backups and it turns

692
00:38:15,600 --> 00:38:19,740
out you didn't have a good, so
it's the combination of the 2

693
00:38:19,740 --> 00:38:21,500
things often.

694
00:38:21,860 --> 00:38:25,520
Nikolay: For me it's still, yeah,
well, yes, yeah, if backups

695
00:38:25,520 --> 00:38:28,380
are missing, it's bad, yeah, you
cannot recover, but also like

696
00:38:28,380 --> 00:38:32,180
data loss for me, classical, like
lower level, it's database

697
00:38:32,220 --> 00:38:36,520
said, commit successful, and then
my data has gone.

698
00:38:37,540 --> 00:38:38,340
Michael: So yeah.

699
00:38:38,800 --> 00:38:42,160
So that's scary and dangerous as
like a CTO.

700
00:38:42,740 --> 00:38:45,300
Nikolay: Undermines trust into
Postgres again, right?

701
00:38:45,420 --> 00:38:45,820
Yeah, yeah.

702
00:38:45,820 --> 00:38:50,320
If procedures are leading to data
loss and also split brains.

703
00:38:51,220 --> 00:38:55,860
Michael: Is it actually happening
often or is it more the CTO

704
00:38:55,900 --> 00:38:59,480
thinking, I don't want to choose
this technology because it could

705
00:38:59,480 --> 00:38:59,980
happen?

706
00:39:00,480 --> 00:39:04,340
Nikolay: Depends on the project
and the level of control.

707
00:39:05,580 --> 00:39:09,740
I'm pretty confident in many, many,
many web and mobile app,

708
00:39:10,200 --> 00:39:14,940
OLTP style projects, if they are,
unless they are like financial

709
00:39:15,060 --> 00:39:18,280
or something like social network,
social media, like maybe even

710
00:39:18,280 --> 00:39:22,240
e-commerce and so on, data loss
happens sometimes unnoticed.

711
00:39:22,960 --> 00:39:23,460
Yeah.

712
00:39:24,320 --> 00:39:27,500
If you have asynchronous replicas
and failover happened and the

713
00:39:27,500 --> 00:39:30,420
process of failover, like Patroni
by default with asynchronous

714
00:39:30,420 --> 00:39:33,740
replicas allows up to 1 megabyte,
it's written in config, Up

715
00:39:33,740 --> 00:39:35,720
to 1 megabyte of data loss, officially.

716
00:39:36,160 --> 00:39:36,920
Maybe byte.

717
00:39:37,300 --> 00:39:39,940
Right, so 1 megabyte is possible
to lose.

718
00:39:40,520 --> 00:39:44,240
And who will complain if it's a
social media comments or something,

719
00:39:44,240 --> 00:39:45,260
we store like comments.

720
00:39:45,260 --> 00:39:48,160
We lost some comments, nobody noticed
maybe.

721
00:39:48,940 --> 00:39:53,000
But if it's a serious project,
it's better not to do it.

722
00:39:53,500 --> 00:39:54,560
Or split brain.

723
00:39:54,900 --> 00:39:58,160
Yeah, anyway, this is more not...

724
00:39:59,860 --> 00:40:02,180
Postgres doesn't drop quite well
here.

725
00:40:02,440 --> 00:40:05,620
It's a question mostly to everything
around Postgres infrastructure

726
00:40:05,740 --> 00:40:09,620
and if it's managed Postgres to
their infrastructure, how they

727
00:40:10,240 --> 00:40:11,820
guarantee there is no data loss.

728
00:40:11,820 --> 00:40:16,220
And last time we discussed the
problem with synchronous_commit

729
00:40:16,320 --> 00:40:21,420
and we discussed in detail how
right now Postgres doesn't do

730
00:40:21,420 --> 00:40:26,120
a good job of not revealing proper
LSNs on standbys, right?

731
00:40:26,120 --> 00:40:33,540
So even Patroni can have a data
loss in case of synchronous_commit

732
00:40:34,220 --> 00:40:35,700
with remote write.

733
00:40:35,900 --> 00:40:36,960
We discussed it.

734
00:40:37,300 --> 00:40:41,460
Okay, anyway, this feels like something
for improvement definitely

735
00:40:41,520 --> 00:40:42,020
here.

736
00:40:43,200 --> 00:40:44,080
Good corruption.

737
00:40:45,240 --> 00:40:50,200
My general feeling, people don't
realize how many types of corruption

738
00:40:50,200 --> 00:40:50,900
might happen.

739
00:40:52,120 --> 00:40:55,900
And it remains unnoticed in so
many cases.

740
00:40:57,200 --> 00:41:03,480
When you start talking, people's
reaction sometimes is, wait,

741
00:41:03,480 --> 00:41:03,980
what?

742
00:41:05,740 --> 00:41:07,940
So yeah, corruption at various
levels.

743
00:41:10,520 --> 00:41:14,100
Michael: So actually, maybe this,
so in terms of the list where

744
00:41:14,100 --> 00:41:17,360
we started, kind of the point of
the topic, there was kind of

745
00:41:17,360 --> 00:41:18,820
dangers, right?

746
00:41:18,820 --> 00:41:22,420
Is this 1 of those ones that if
it silently happened for a while,

747
00:41:22,420 --> 00:41:25,740
it suddenly becomes a complete
loss of trust of the underlying

748
00:41:25,800 --> 00:41:26,740
system or?

749
00:41:26,760 --> 00:41:32,560
Nikolay: Yeah, as usual, I can
rattle about a little about how

750
00:41:32,560 --> 00:41:34,400
Postgres defaults are outdated.

751
00:41:34,540 --> 00:41:38,040
We know data checksums only recently
enabled by default.

752
00:41:38,480 --> 00:41:39,780
Yeah, great change.

753
00:41:40,520 --> 00:41:43,040
Like a month or 2 ago, right?

754
00:41:43,140 --> 00:41:44,480
It will be only in 18.

755
00:41:45,040 --> 00:41:46,720
It should be done 5 years ago.

756
00:41:46,720 --> 00:41:47,360
We saw

757
00:41:47,360 --> 00:41:47,860
Michael: evidence.

758
00:41:47,880 --> 00:41:48,380
Some,

759
00:41:48,760 --> 00:41:49,260
Nikolay: yeah.

760
00:41:49,400 --> 00:41:52,480
Yeah, many managed Postgres providers
did it, like RDS.

761
00:41:52,540 --> 00:41:52,900
That's great.

762
00:41:52,900 --> 00:41:53,700
Michael: Which is great.

763
00:41:53,800 --> 00:41:56,540
And that kind of is then the default
for a lot of people.

764
00:41:56,540 --> 00:41:59,540
Nikolay: Yeah, but it also doesn't
guarantee that you don't have

765
00:41:59,540 --> 00:42:00,040
corruption.

766
00:42:00,240 --> 00:42:04,400
You need to read all the pages
from time to time, right?

767
00:42:04,400 --> 00:42:08,160
And do they offer something, some
anti-corruption tooling?

768
00:42:08,540 --> 00:42:10,300
They don't, nobody.

769
00:42:11,320 --> 00:42:12,880
Okay, they enabled, so what?

770
00:42:12,880 --> 00:42:15,260
This is just a small piece of the
whole puzzle.

771
00:42:16,160 --> 00:42:18,420
Michael: Yeah, and amcheck improving
as well.

772
00:42:18,420 --> 00:42:20,120
I think, is it in 18?

773
00:42:20,200 --> 00:42:21,180
Nikolay: Yeah, that's great.

774
00:42:21,260 --> 00:42:22,900
Michael: In indexes as well.

775
00:42:22,940 --> 00:42:23,920
Nikolay: Yeah, big.

776
00:42:24,140 --> 00:42:27,760
Michael: In fact, I think we, did
we have an episode on amcheck?

777
00:42:27,800 --> 00:42:30,920
I feel like it came up at least
once or twice, maybe index maintenance.

778
00:42:30,920 --> 00:42:33,420
Nikolay: I cannot remember all
our episodes, so many.

779
00:42:33,420 --> 00:42:34,340
Michael: Yeah, me neither.

780
00:42:34,540 --> 00:42:37,120
Nikolay: But it's great to see
the progress on the area of amcheck,

781
00:42:37,120 --> 00:42:43,280
but again, it's a small piece of
the puzzle and it has settings,

782
00:42:43,320 --> 00:42:45,740
like options to choose from.

783
00:42:46,400 --> 00:42:49,840
So it's not trivial to choose among
options.

784
00:42:49,900 --> 00:42:53,680
And also again, like a couple of
episodes ago, we discussed support

785
00:42:53,680 --> 00:42:57,480
and how difficult it is to sometimes
understand what's happening.

786
00:42:57,740 --> 00:42:58,140
Right.

787
00:42:58,140 --> 00:42:58,580
Yeah.

788
00:42:58,580 --> 00:43:04,540
And I, I right now have a case
where very big platform doesn't

789
00:43:04,540 --> 00:43:06,540
help to investigate corruption.

790
00:43:07,740 --> 00:43:13,380
And nobody has like ability to
investigate it but them, and but

791
00:43:13,380 --> 00:43:14,700
they are not helping.

792
00:43:15,900 --> 00:43:21,540
So it was 1 of reasons I was provoked
to talk about that in that

793
00:43:21,540 --> 00:43:22,040
episode.

794
00:43:22,040 --> 00:43:22,940
So it's bad.

795
00:43:23,000 --> 00:43:24,280
It looks really bad.

796
00:43:24,280 --> 00:43:27,340
Like, you guys are responsible
for that corruption case, and

797
00:43:27,340 --> 00:43:29,000
you don't do a great job.

798
00:43:29,600 --> 00:43:32,840
And I think it's a problem of industry,
and we discussed it already,

799
00:43:32,840 --> 00:43:34,020
so let's not repeat.

800
00:43:34,340 --> 00:43:40,940
But in general, in general, I think
if you are CTO or like leader

801
00:43:41,920 --> 00:43:47,940
who decides priorities, my big advice is to take this list and

802
00:43:48,780 --> 00:43:52,620
check this, like evaluate situation for yourself.

803
00:43:53,480 --> 00:43:56,420
Better let us do it, of course, but you can do it yourself.

804
00:43:56,960 --> 00:44:03,140
And then plan some proactive measures because corruption testing

805
00:44:03,220 --> 00:44:06,060
can be done even on RDS proactively.

806
00:44:06,460 --> 00:44:09,140
If it happens, you need support, of course, because sometimes

807
00:44:09,140 --> 00:44:11,960
it's like it's low level, you don't have access, but at least

808
00:44:11,960 --> 00:44:15,120
you will feel control over it, right?

809
00:44:16,360 --> 00:44:18,360
So, over corruption.

810
00:44:18,460 --> 00:44:20,980
So, anti-corruption tooling is needed.

811
00:44:20,980 --> 00:44:22,620
This is what I feel.

812
00:44:24,720 --> 00:44:25,280
That's it.

813
00:44:25,280 --> 00:44:26,440
That's all my list.

814
00:44:26,840 --> 00:44:31,960
I'm sure it's lacking something like security, more security

815
00:44:31,980 --> 00:44:36,420
related stuff for example as usual I tend to like it what do

816
00:44:36,420 --> 00:44:39,980
you think like was it was it good

817
00:44:40,080 --> 00:44:43,680
Michael: yeah you see a lot more of these things than I do obviously

818
00:44:44,540 --> 00:44:47,740
but yeah I think it's a really good list and check with checklists

819
00:44:47,800 --> 00:44:52,580
right it's not of course if you could go on forever with with

820
00:44:52,580 --> 00:44:57,500
things to be scared of but this this feels like if you ticked

821
00:44:57,500 --> 00:45:00,800
all of these off you'd be in such a good position versus most

822
00:45:01,400 --> 00:45:05,040
and obviously things can still go wrong but these are some of

823
00:45:05,040 --> 00:45:07,580
the most at least even if they're not the most common some of

824
00:45:07,580 --> 00:45:10,120
the things that could cause the biggest issues the things that

825
00:45:10,120 --> 00:45:14,560
are most likely to get on the the CEO's desk or in the inbox

826
00:45:15,040 --> 00:45:17,760
so yeah this this feels like if you're on top of all of these

827
00:45:17,760 --> 00:45:21,340
things, you're going to go a long, long way before you hit issues.

828
00:45:22,060 --> 00:45:23,640
Nikolay: I have a question for you.

829
00:45:24,020 --> 00:45:28,700
Guess among these 10 items, which item I never had in my production

830
00:45:28,780 --> 00:45:29,280
life?

831
00:45:30,900 --> 00:45:32,640
Michael: Oh, maybe...

832
00:45:33,820 --> 00:45:34,900
Wait, give me a second.

833
00:45:34,960 --> 00:45:35,860
Nikolay: It's tricky, right?

834
00:45:35,860 --> 00:45:37,100
Michael: Transaction ID wraparound?

835
00:45:37,360 --> 00:45:39,680
Nikolay: Exactly, how, yeah, I never had it.

836
00:45:39,680 --> 00:45:40,280
I only-

837
00:45:40,280 --> 00:45:41,080
Michael: It's rare.

838
00:45:41,400 --> 00:45:42,800
Nikolay: Well, it's rare, yeah.

839
00:45:42,800 --> 00:45:44,800
Let's cross it off, no problem.

840
00:45:45,060 --> 00:45:52,680
Yeah, I only found a way to emulate it, right, which we did multiple

841
00:45:52,680 --> 00:45:55,320
times, but never had it in reality in production.

842
00:45:57,160 --> 00:45:59,620
So yeah, everything else I had.

843
00:46:00,080 --> 00:46:00,580
Yeah.

844
00:46:00,720 --> 00:46:01,500
Not once.

845
00:46:01,640 --> 00:46:04,200
Michael: I nearly guessed too quickly that it was going to be

846
00:46:04,200 --> 00:46:07,000
Split Brain, but then I was like, no wait, read the whole list.

847
00:46:07,580 --> 00:46:10,520
But I'm guessing you had Split Brains like a couple of times,

848
00:46:10,520 --> 00:46:11,420
maybe maximum?

849
00:46:11,760 --> 00:46:14,480
Nikolay: Yeah, Replication Manager, Split Brain as a service,

850
00:46:14,480 --> 00:46:14,980
yes.

851
00:46:16,420 --> 00:46:19,520
Michael: Okay, yeah, pre-Patroni days.

852
00:46:20,280 --> 00:46:21,300
Yeah, makes sense.

853
00:46:21,680 --> 00:46:22,760
All right, nice 1.

854
00:46:22,760 --> 00:46:23,860
Thanks so much, Nikolay.

855
00:46:24,060 --> 00:46:24,780
Nikolay: Thank you.