1
00:00:00,060 --> 00:00:02,160
Michael: Hello and welcome to
Postgres.FM, a weekly show

2
00:00:02,160 --> 00:00:03,140
about all things PostgreSQL.

3
00:00:03,400 --> 00:00:05,860
I am Michael, founder of pgMustard
and this is Nikolay, founder

4
00:00:05,860 --> 00:00:06,920
of Postgres.AI.

5
00:00:07,120 --> 00:00:08,500
Hey Nikolay, how's it going?

6
00:00:09,180 --> 00:00:11,440
Nikolay: Everything is alright,
how are you?

7
00:00:11,880 --> 00:00:13,040
Michael: Yeah, good thank you.

8
00:00:13,040 --> 00:00:15,640
It's been a nice sunny week here
in the UK.

9
00:00:16,220 --> 00:00:18,540
Nikolay: Oh, this is who stole
our sun, okay.

10
00:00:19,300 --> 00:00:22,440
Michael: So this week we got a
listener request, in fact we get

11
00:00:22,440 --> 00:00:24,360
quite a lot of listener requests,
thank you everybody.

12
00:00:24,720 --> 00:00:29,740
This 1 was from Shayon Mukherjee,
apologies for pronunciation,

13
00:00:30,920 --> 00:00:37,460
and they asked us about
synchronous_commit and whether or when

14
00:00:37,460 --> 00:00:41,500
it might make sense to ever turn
it off or use different settings

15
00:00:41,580 --> 00:00:44,560
and I thought it's an interesting
topic full stop so I wanted

16
00:00:44,560 --> 00:00:46,500
to get your thoughts as well on
this.

17
00:00:47,020 --> 00:00:48,740
Nikolay: I have many thoughts on
this.

18
00:00:49,120 --> 00:00:50,880
We need to start somewhere.

19
00:00:52,360 --> 00:00:52,860
So

20
00:00:54,020 --> 00:00:58,540
Michael: I wondered about, let's,
starting probably not in the

21
00:00:58,540 --> 00:01:01,620
most obvious place but on a single
node setup.

22
00:01:01,620 --> 00:01:05,740
I know this becomes more interesting,
multi-node, but on single

23
00:01:05,740 --> 00:01:11,480
node, pure Postgres, there are
still some interesting discussion

24
00:01:11,480 --> 00:01:12,560
points here, right?

25
00:01:13,680 --> 00:01:15,980
Nikolay: Yeah, and even before
that, there is single...

26
00:01:17,360 --> 00:01:19,900
So the meaning of synchronous_commit
is overloaded.

27
00:01:21,140 --> 00:01:27,300
I know like technically it defines
behavior of commit, right?

28
00:01:27,700 --> 00:01:31,580
And there is local behavior and
if we have synchronous standbys,

29
00:01:32,420 --> 00:01:33,740
there is remote behavior.

30
00:01:34,860 --> 00:01:38,740
But originally it was only about
local.

31
00:01:39,520 --> 00:01:40,020
Yeah.

32
00:01:41,120 --> 00:01:46,160
And then decision was, it was only
on and off 2 options.

33
00:01:46,340 --> 00:01:50,600
This is how I remember it for a
decade or more, more than a decade

34
00:01:50,600 --> 00:01:53,040
of being Postgres user.

35
00:01:54,960 --> 00:01:57,600
And then there is overloaded meaning.

36
00:01:58,180 --> 00:02:03,060
Remote apply, remote write, remote
apply.

37
00:02:03,260 --> 00:02:03,980
2 options.

38
00:02:05,140 --> 00:02:12,040
So I think my first thought is
it's a huge mistake to mix things

39
00:02:12,040 --> 00:02:12,520
here.

40
00:02:12,520 --> 00:02:16,820
It causes pain in users like me.

41
00:02:17,360 --> 00:02:23,100
All the time I like, I read it,
I read it, I read it, like, I

42
00:02:23,100 --> 00:02:24,400
cannot remember this.

43
00:02:24,640 --> 00:02:27,260
To remember this, you need to deal
only with this.

44
00:02:28,940 --> 00:02:32,080
Michael: Yeah, or look up, well
you can't remember it, you just

45
00:02:32,080 --> 00:02:32,720
look it up.

46
00:02:32,720 --> 00:02:33,180
I just look

47
00:02:33,180 --> 00:02:33,620
Nikolay: it up.

48
00:02:33,620 --> 00:02:36,420
Every time you look it up and try
to understand the logic.

49
00:02:36,420 --> 00:02:41,120
It's like, we have many places
in GUC system, Postgres system

50
00:02:41,120 --> 00:02:41,640
like this.

51
00:02:41,640 --> 00:02:45,360
Like for example, you know, we
still do a lot of consulting.

52
00:02:45,440 --> 00:02:51,960
We help our customers, many of
them are startup companies who

53
00:02:51,960 --> 00:02:54,740
grow rapidly, very smart people.

54
00:02:55,920 --> 00:02:58,520
They understand things very quickly,
but you need to explain

55
00:02:58,520 --> 00:02:59,280
what's happening.

56
00:02:59,440 --> 00:03:02,620
And 1 of the things like we, we
deal with bloat all the time.

57
00:03:02,620 --> 00:03:04,400
Bloat is 1 of the major concerns.

58
00:03:04,400 --> 00:03:07,540
And we explain, okay, we need autovacuum
tuning.

59
00:03:07,540 --> 00:03:11,980
And there is autovacuum_work_mem,
which is minus 1.

60
00:03:12,740 --> 00:03:15,320
It means you need to look at maintenance_work_mem.

61
00:03:15,660 --> 00:03:17,860
Why was this done?

62
00:03:17,860 --> 00:03:19,340
This is exactly the same thing.

63
00:03:19,340 --> 00:03:21,620
Like why was this done to my brain?

64
00:03:21,620 --> 00:03:26,740
Like somebody made some decision
to cause huge pain for ages.

65
00:03:28,080 --> 00:03:32,640
It's a UX issue, like a user experience
issue, because instead

66
00:03:32,640 --> 00:03:38,060
of clarifying and separating things,
we develop some implicit

67
00:03:38,200 --> 00:03:40,940
dependencies and you need to be
expert.

68
00:03:41,460 --> 00:03:46,680
And even if you are expert, it's
like, it's really hard to memorize

69
00:03:46,680 --> 00:03:50,740
all these things and the problem
is like at some point we will

70
00:03:50,740 --> 00:03:55,120
be use more and more AI and so
on but to explain these things

71
00:03:55,120 --> 00:04:01,200
to AI it's also difficult because
there are implicit dependencies

72
00:04:01,320 --> 00:04:04,400
So synchronous_commit can be local
but it also can be not local.

73
00:04:04,400 --> 00:04:07,540
And local is on and off and there
is also word local, right?

74
00:04:08,100 --> 00:04:08,600
Yeah.

75
00:04:08,720 --> 00:04:11,020
So meaning of on was rewritten.

76
00:04:12,340 --> 00:04:13,140
Thank you guys.

77
00:04:13,140 --> 00:04:13,880
Thank you.

78
00:04:13,880 --> 00:04:15,340
Thank you who did this.

79
00:04:15,340 --> 00:04:16,080
Thank you.

80
00:04:16,160 --> 00:04:19,640
Like to explain, like if you try
to be consultant and explain

81
00:04:19,640 --> 00:04:27,260
things to others very smart guys
You will see the pain It's really

82
00:04:27,260 --> 00:04:29,100
hard and why is this hard?

83
00:04:29,100 --> 00:04:31,860
This is my first thought why is
this so hard?

84
00:04:34,640 --> 00:04:38,000
Michael: As you say it's because
combining 2 things right and

85
00:04:38,000 --> 00:04:41,320
and the behavior is I can see why
they combined it though like

86
00:04:41,320 --> 00:04:41,980
the behavior

87
00:04:41,980 --> 00:04:45,000
Nikolay: I also can see no no questions
here

88
00:04:45,560 --> 00:04:48,380
Michael: And to their credit the
documentation is really nicely

89
00:04:48,380 --> 00:04:51,180
written so it can be pointed.

90
00:04:51,180 --> 00:04:54,960
Nikolay: It's nicely written but
it doesn't explain all the things

91
00:04:54,960 --> 00:04:59,480
so I hope we will touch additional
topics not covered by documentation

92
00:04:59,640 --> 00:04:59,960
today.

93
00:04:59,960 --> 00:05:00,460
Michael: Nice.

94
00:05:02,060 --> 00:05:07,340
So let's talk about the single
node case on off on by default

95
00:05:07,340 --> 00:05:10,520
right So like why would we want
to turn it off?

96
00:05:10,520 --> 00:05:11,700
What's happening here?

97
00:05:11,820 --> 00:05:14,720
Nikolay: Yeah, if we forget about
multi-node setups, we have

98
00:05:14,720 --> 00:05:15,980
just 1 node, right?

99
00:05:16,620 --> 00:05:23,240
And if we have write heavy workloads
and we think write latency

100
00:05:23,240 --> 00:05:26,740
is not good and eventually right
throughput is not good, because

101
00:05:28,580 --> 00:05:31,340
if latency is not good, throughput
won't be good as well.

102
00:05:32,120 --> 00:05:35,040
There is some dependency there,
which is not straightforward.

103
00:05:36,780 --> 00:05:41,340
And it means that, well, we know
at commit time, Postgres needs

104
00:05:41,460 --> 00:05:45,520
to make sure information is written
to WAL and it's present,

105
00:05:46,100 --> 00:05:49,260
like synchronized, like it's on
disk.

106
00:05:49,260 --> 00:05:49,600
Right.

107
00:05:49,600 --> 00:05:55,300
So we need, before that, Postgres
cannot respond to the user

108
00:05:55,560 --> 00:05:57,140
that commit was successful.

109
00:05:57,940 --> 00:06:02,680
And when we say commit, it can
be explicit commit, or it can

110
00:06:02,680 --> 00:06:05,640
be implicit commit if we have a
single statement transaction,

111
00:06:06,160 --> 00:06:10,240
which is super popular approach
in web and mobile apps.

112
00:06:10,240 --> 00:06:13,820
So when we just have single statement
transactions all the time,

113
00:06:13,820 --> 00:06:19,280
just separate and so separate,
separate update and so on And

114
00:06:19,280 --> 00:06:22,320
all of them have implicit commit
as well.

115
00:06:22,960 --> 00:06:27,940
So commit can be successful only
when WAL is already on disk.

116
00:06:28,780 --> 00:06:32,420
And it means We need to wait for
disk.

117
00:06:33,580 --> 00:06:43,140
And if disk is not a RAMFS, which
is in 99.99% of cases, right?

118
00:06:43,140 --> 00:06:45,780
If it's regular disk, it can be
quite slow.

119
00:06:46,780 --> 00:06:52,200
And in this case, we deal with,
like if our transactions are

120
00:06:52,200 --> 00:06:58,520
super fast, like extremely fast,
which is actually rare, usually

121
00:06:58,520 --> 00:06:59,700
they are not so fast.

122
00:07:00,280 --> 00:07:04,500
Because of various cases, there's
things like including, for

123
00:07:04,500 --> 00:07:06,000
example, index write amplification.

124
00:07:07,520 --> 00:07:12,240
I have moved to criticize Postgres
today, you know, like, Yeah,

125
00:07:12,440 --> 00:07:13,440
you know, you know me.

126
00:07:13,440 --> 00:07:15,680
Michael: Well, yeah, so I feel
like so far you've just yeah,

127
00:07:15,680 --> 00:07:18,240
you've you've so far described
the kind of synchronous_commit

128
00:07:18,240 --> 00:07:23,540
equals on case it's so we have
to wait yes which is default and

129
00:07:23,940 --> 00:07:27,920
we're waiting for the flush to
happen so the rights to disk before

130
00:07:27,920 --> 00:07:31,600
confirming back to the client that
that commit has been successful

131
00:07:33,380 --> 00:07:37,900
but turning it off we don't wait
for the the extra step of flushing

132
00:07:37,900 --> 00:07:43,040
to disk before confirming to the
client, which sounds risky.

133
00:07:43,440 --> 00:07:50,200
Well, the risk is if there happens
to be a crash on the node,

134
00:07:50,660 --> 00:07:55,920
in the time between the write-ahead
log being written and the

135
00:07:55,920 --> 00:08:00,240
flush, so the commit being confirmed
and the flush to disk, we

136
00:08:00,240 --> 00:08:01,700
lose any of that data.

137
00:08:01,800 --> 00:08:05,940
So the risk here is some data loss
on crash.

138
00:08:06,140 --> 00:08:10,540
Nikolay: Yeah, the short tail of
WAL is lost.

139
00:08:11,240 --> 00:08:16,340
Postgres will still reach consistency,
all good, but we lose

140
00:08:17,040 --> 00:08:18,220
some part of data.

141
00:08:18,840 --> 00:08:25,160
It's similar to any multi-node
setup with asynchronous node when

142
00:08:25,160 --> 00:08:31,360
failover happens and if standbys
are asynchronous, nobody guarantees

143
00:08:31,580 --> 00:08:35,200
that at failover everything will
be present there, right?

144
00:08:35,220 --> 00:08:36,240
Because it's asynchronous.

145
00:08:36,380 --> 00:08:40,560
There can be lags, but there's
consistency, all good, and we

146
00:08:40,560 --> 00:08:45,200
can lose, for example, by default
in Patroni settings, as I remember,

147
00:08:45,200 --> 00:08:48,840
there's a setting, I always forget
the name, but a setting like

148
00:08:48,840 --> 00:08:52,820
maximum lag allowed it to fail
over, as I remember by default

149
00:08:52,820 --> 00:08:56,660
it's 1 megabyte, so up to 1 megabyte
is okay to lose.

150
00:08:57,020 --> 00:09:02,120
Here it's similar, there is no
threshold, like unlimited, but

151
00:09:02,640 --> 00:09:06,100
technically it cannot be very, cannot be huge.

152
00:09:06,680 --> 00:09:10,200
Michael: I looked into it and there is the default setting is

153
00:09:10,200 --> 00:09:15,080
a write-ahead log writer delay of 200 milliseconds by default,

154
00:09:15,660 --> 00:09:19,780
And so I thought maybe only 200 milliseconds worth of writes

155
00:09:19,780 --> 00:09:22,440
could be lost, but in the documentation it said actually...

156
00:09:23,500 --> 00:09:24,640
Yes, good point.

157
00:09:24,900 --> 00:09:31,220
Actually, it's for implementation detail reasons, it says it's

158
00:09:31,220 --> 00:09:32,300
3 times that.

159
00:09:32,300 --> 00:09:37,680
So by default, it's 600 milliseconds that could be lost.

160
00:09:39,140 --> 00:09:40,740
But you could reduce that.

161
00:09:40,760 --> 00:09:44,800
This setting only makes sense to change if you're setting synchronous_commit

162
00:09:44,800 --> 00:09:46,540
for any of your transactions.

163
00:09:47,500 --> 00:09:49,000
But you can reduce it if you want.

164
00:09:49,000 --> 00:09:53,540
So you could, if you do decide to go for some use of synchronous_commit

165
00:09:53,620 --> 00:09:57,720
equals off, then this is a setting you could reduce to

166
00:09:57,720 --> 00:09:58,780
minimize data loss.

167
00:09:58,780 --> 00:10:00,920
But at that point, you're still losing some data.

168
00:10:00,920 --> 00:10:04,020
You might still be losing some data, so it's, I'm not sure if

169
00:10:04,020 --> 00:10:05,240
it matters that much.

170
00:10:05,320 --> 00:10:06,180
Nikolay: Right, right.

171
00:10:06,180 --> 00:10:11,260
So maximum is 3 times of that wal_writer_delay, right.

172
00:10:11,540 --> 00:10:15,040
And the question is, like, should we consider this?

173
00:10:15,100 --> 00:10:19,880
My answer is if writes are super fast, maybe yes.

174
00:10:23,140 --> 00:10:24,780
There is a trade-off here, obviously.

175
00:10:26,000 --> 00:10:30,340
Michael: Yeah, I was thinking about it in our last episode about

176
00:10:30,340 --> 00:10:31,460
time series data.

177
00:10:32,380 --> 00:10:37,740
But you made a really good point about if you're if you are writing

178
00:10:37,740 --> 00:10:41,760
kind of lots and lots of little writes often in time series workloads

179
00:10:41,760 --> 00:10:44,440
people suggest actually batching inserts

180
00:10:45,620 --> 00:10:45,900
Nikolay: so

181
00:10:45,900 --> 00:10:48,900
Michael: by that point you're yeah so batching makes a load of

182
00:10:48,900 --> 00:10:50,280
sense for optimization.

183
00:10:50,280 --> 00:10:51,260
Nikolay: It's a good point.

184
00:10:51,260 --> 00:10:54,060
So choosing between synchronous_commit off and batching, you

185
00:10:54,060 --> 00:10:55,060
just choose batching.

186
00:10:55,680 --> 00:10:59,500
Michael: Yeah, so then, but then at that point, your, your transactions

187
00:10:59,500 --> 00:11:02,220
are not that fast because you're batching maybe a thousand or

188
00:11:02,220 --> 00:11:06,360
more of them together and then the benefits of synchronous_commit

189
00:11:06,740 --> 00:11:12,700
equals off of much smaller yeah exactly so it's an in I don't

190
00:11:12,700 --> 00:11:15,540
think you can get the benefit twice if that makes sense you

191
00:11:15,540 --> 00:11:15,780
Nikolay: don't get

192
00:11:15,780 --> 00:11:17,060
Michael: the benefit of doing both.

193
00:11:17,080 --> 00:11:21,660
Nikolay: Single row inserts versus 1 insert of 1, 000 rows.

194
00:11:21,780 --> 00:11:25,780
1, 000 inserts versus 1, 000 row single insert.

195
00:11:26,200 --> 00:11:29,440
Of course, you'll have just 1 commit.

196
00:11:29,540 --> 00:11:30,360
It's great.

197
00:11:31,220 --> 00:11:32,420
And it will be faster.

198
00:11:32,980 --> 00:11:37,860
But of course, or updates for example, of course you should avoid

199
00:11:38,000 --> 00:11:44,520
huge batches, because they have also downsides, like longer lasting

200
00:11:44,580 --> 00:11:51,340
locks, Or in case if such batch is crushed, you lose more.

201
00:11:51,340 --> 00:11:52,940
You need to retry more, right?

202
00:11:53,260 --> 00:11:56,460
Instead of losing just 1 insert, you're losing all 1,000 inserts.

203
00:11:57,440 --> 00:12:02,160
So as usual, batching, my rule of thumb is just to choose batch

204
00:12:02,160 --> 00:12:07,120
size so it's roughly like 1 or 2 seconds maximum to avoid long

205
00:12:07,120 --> 00:12:11,260
lasting locks and risks to repeat too much.

206
00:12:11,680 --> 00:12:13,180
In this case, it's great.

207
00:12:13,180 --> 00:12:17,920
Like in this case, just 1 commit per second or maybe 10 commits

208
00:12:17,920 --> 00:12:20,200
per second, it's a very low number.

209
00:12:20,220 --> 00:12:26,920
So in this case, the overhead of the need to wait for a WAL

210
00:12:26,920 --> 00:12:31,860
to be synced to disk is super low, which is like roughly you

211
00:12:31,860 --> 00:12:34,240
can think about it like not more than 1 millisecond.

212
00:12:35,580 --> 00:12:38,160
In this case, like just forget about it.

213
00:12:38,160 --> 00:12:41,660
synchronous_commit off is not a good idea, that's it.

214
00:12:42,280 --> 00:12:42,780
Yeah.

215
00:12:43,180 --> 00:12:47,360
But usually things are much more complicated because we just

216
00:12:47,360 --> 00:12:49,940
consider 1 type of workload.

217
00:12:50,000 --> 00:12:52,480
Any project has many types of workload, right?

218
00:12:54,000 --> 00:12:56,980
Michael: Yeah, and that's a good point that Shayon actually made

219
00:12:56,980 --> 00:13:00,900
in a follow-up post that I think sometimes gets missed in this

220
00:13:00,900 --> 00:13:05,200
topic, which is that it can be set on a per transaction basis

221
00:13:05,200 --> 00:13:10,140
or per session basis per user basis, so if you're doing a one-off

222
00:13:10,160 --> 00:13:15,980
if you're doing like a one-off migration or bulk load of data

223
00:13:16,060 --> 00:13:22,120
and you can't for some reason not batch yeah then maybe it's

224
00:13:22,120 --> 00:13:26,400
worth considering turning it off just for that user or for that

225
00:13:26,400 --> 00:13:28,020
session or per transaction.

226
00:13:28,260 --> 00:13:31,740
Nikolay: What kind of situation could it be in if you cannot

227
00:13:31,740 --> 00:13:33,580
buy like bulk but not batch?

228
00:13:33,580 --> 00:13:36,700
Michael: I was struggling I was struggling to think of it, maybe

229
00:13:36,700 --> 00:13:41,080
some streaming, like streaming case, but it felt tenuous.

230
00:13:41,380 --> 00:13:46,160
Nikolay: If technically it comes from many, many sources, instead

231
00:13:46,160 --> 00:13:52,120
of saying, okay, we need Kafka before it comes to Postgres, and

232
00:13:52,200 --> 00:13:53,300
then we will batch.

233
00:13:53,300 --> 00:13:54,880
It's like, it's a heavy solution.

234
00:13:54,960 --> 00:14:00,900
Maybe indeed, this is exactly the case when we, it's worth considering

235
00:14:01,560 --> 00:14:03,280
off, right?

236
00:14:03,660 --> 00:14:04,160
To

237
00:14:04,540 --> 00:14:05,700
Michael: use off.

238
00:14:05,900 --> 00:14:09,760
The other thing I was thinking about is whether this comes up

239
00:14:09,760 --> 00:14:16,500
as a thought for people because of tools like pgbench that by

240
00:14:16,500 --> 00:14:21,220
default, you get a lot of small transactions, a lot of small

241
00:14:21,220 --> 00:14:22,360
very fast transactions.

242
00:14:23,040 --> 00:14:27,620
So because, I wondered if it becomes interesting to people because

243
00:14:27,620 --> 00:14:33,460
they just think let's quickly do a test with pgbench and oh wow

244
00:14:33,460 --> 00:14:35,580
synchronous_commit, what's that?

245
00:14:35,600 --> 00:14:39,020
Nikolay: Have you tried pgbench and turn off synchronous_commit?

246
00:14:39,320 --> 00:14:43,380
I tried many times because it's obvious idea and I don't remember

247
00:14:43,380 --> 00:14:44,940
I saw a huge advantage.

248
00:14:45,360 --> 00:14:49,340
Because pgbench write transactions by default, they are not so

249
00:14:49,640 --> 00:14:50,140
simple.

250
00:14:50,740 --> 00:14:54,280
It's, as I remember, it's update plus 2 inserts, maybe a delete

251
00:14:54,280 --> 00:14:54,940
as well.

252
00:14:55,240 --> 00:14:59,880
It's multi-statement transaction,
which is definitely not super

253
00:15:00,480 --> 00:15:00,780
lightweight.

254
00:15:00,780 --> 00:15:01,560
Yeah, fair enough.

255
00:15:01,560 --> 00:15:03,400
I didn't try it.

256
00:15:03,400 --> 00:15:03,900
Yeah.

257
00:15:06,720 --> 00:15:11,500
Well, I can imagine we can have
situations when the effect is

258
00:15:11,500 --> 00:15:15,060
quite noticeable, like dozens of
percent, dozens of percent.

259
00:15:15,060 --> 00:15:21,520
But in general, from my practice,
I stopped looking in this direction.

260
00:15:22,060 --> 00:15:23,370
Yeah, fair enough.

261
00:15:23,370 --> 00:15:24,040
To think about often.

262
00:15:24,140 --> 00:15:27,940
And of course, data loss is not
a good idea.

263
00:15:28,680 --> 00:15:32,320
So, I don't know, like write heavy
workloads, maybe yes, but

264
00:15:32,320 --> 00:15:36,140
again, batching and all proper
tricks to reduce a number of actual

265
00:15:36,140 --> 00:15:38,040
commits happening and that's it.

266
00:15:38,600 --> 00:15:41,940
Michael: Yeah and I think people
when people talk about data

267
00:15:41,940 --> 00:15:45,700
loss they quite often immediately
jump to things like banking

268
00:15:45,700 --> 00:15:48,740
apps you know things that where
it's like absolutely critical

269
00:15:48,820 --> 00:15:52,120
not to, you know, you'd much rather
have some downtime than have

270
00:15:52,120 --> 00:15:53,220
some data loss.

271
00:15:53,980 --> 00:15:58,280
But I also think user experience
wise, like even for just regular

272
00:15:58,280 --> 00:16:02,240
CRUD apps, losing data can be really
confusing to people, even

273
00:16:02,240 --> 00:16:04,620
if it's only a not that important
thing.

274
00:16:04,840 --> 00:16:08,420
The cases I was thinking of are
much more like the time series

275
00:16:08,420 --> 00:16:12,180
ones, where if we've got 1 sensor
reporting every minute, if

276
00:16:12,180 --> 00:16:16,900
we lose 1 minute, if we lose a
couple of those sensor reports,

277
00:16:16,920 --> 00:16:17,980
it's not that important.

278
00:16:17,980 --> 00:16:20,960
We can still do a bunch of aggregates
on that data.

279
00:16:21,040 --> 00:16:24,240
So I guess for monitoring and logging
type things, maybe it's

280
00:16:24,240 --> 00:16:24,980
less important.

281
00:16:25,640 --> 00:16:28,000
But there can be really important
logs, too, right?

282
00:16:28,000 --> 00:16:32,540
So I personally, I'm not I don't
see too many use cases where

283
00:16:32,540 --> 00:16:33,780
this makes a lot of sense.

284
00:16:33,780 --> 00:16:37,060
But there is a whole page on the
Postgres docs about, it's called

285
00:16:37,060 --> 00:16:38,100
asynchronous commit.

286
00:16:38,220 --> 00:16:42,240
So it must have been somewhat of
interest to quite a lot of people.

287
00:16:44,220 --> 00:16:44,720
Nikolay: Right.

288
00:16:45,060 --> 00:16:46,220
Yeah, yeah, yeah.

289
00:16:47,100 --> 00:16:48,500
Michael: Should we switch to the
topic?

290
00:16:49,020 --> 00:16:51,300
It feels like this gets more interesting
when we start talking

291
00:16:51,300 --> 00:16:52,240
Nikolay: about synchronous.

292
00:16:52,440 --> 00:16:53,300
Let's do it.

293
00:16:53,680 --> 00:16:56,140
Let's move on and demultiply.

294
00:16:56,980 --> 00:17:00,280
First of all, own is overloaded
when you have synchronous_standby_names

295
00:17:00,280 --> 00:17:01,520
non-empty.

296
00:17:02,840 --> 00:17:05,640
synchronous_standby_names non-empty,
right?

297
00:17:07,800 --> 00:17:13,940
We can have purely synchronous
replication or we can have so-called

298
00:17:13,980 --> 00:17:18,700
quorum commit and I just had a
chat with 1 of our team members

299
00:17:18,700 --> 00:17:20,520
and I agree.

300
00:17:21,180 --> 00:17:24,440
I heard criticism as well, like
quorum commit is a very misleading

301
00:17:24,960 --> 00:17:29,360
term here because usually it's used like this, like we have multiple

302
00:17:29,380 --> 00:17:34,340
standbys, They are synchronous originally by nature, but then

303
00:17:34,660 --> 00:17:37,620
we say synchronous_commit, for example, remote write.

304
00:17:38,500 --> 00:17:43,480
And then we configure Postgres to allow synchronous_standby_names

305
00:17:43,480 --> 00:17:45,040
to have any, right?

306
00:17:45,040 --> 00:17:46,020
We say any.

307
00:17:46,400 --> 00:17:50,420
And for example, we have, say, 5 replicas, 5 standbys.

308
00:17:50,840 --> 00:17:54,120
And we say any 1, right, any 1 of them.

309
00:17:54,520 --> 00:17:58,680
And it means that commit happens on the primary and on any 1

310
00:17:58,840 --> 00:17:59,340
additionally.

311
00:17:59,600 --> 00:18:06,360
So on 2 nodes before a user receives success of commit.

312
00:18:07,120 --> 00:18:09,100
It means 2 out of 6.

313
00:18:10,520 --> 00:18:12,420
5 standbys, primary, 6 nodes.

314
00:18:12,420 --> 00:18:16,340
So 2 out of 6 is not quorum, because quorum, definition of quorum

315
00:18:16,700 --> 00:18:19,040
means more than 50.

316
00:18:19,220 --> 00:18:21,180
Michael: So that would be 4 out of 6?

317
00:18:21,180 --> 00:18:22,180
Nikolay: Usually, usually.

318
00:18:22,400 --> 00:18:26,820
I think we can like say this in this case it's the criticism

319
00:18:26,920 --> 00:18:29,540
like it's conditional, right?

320
00:18:29,540 --> 00:18:34,420
We may say okay in our case quorum definition is kind of rewritten

321
00:18:35,320 --> 00:18:37,280
and we need 2 out of 6.

322
00:18:37,280 --> 00:18:39,740
But it sounds strange a little bit.

323
00:18:40,080 --> 00:18:45,540
So maybe, how should we call this situation when we not just

324
00:18:45,540 --> 00:18:50,280
1 synchronous replica, and every commit must happen on both but

325
00:18:50,280 --> 00:18:55,760
we have multiple replicas and we say 1 more or 2 more so how

326
00:18:55,760 --> 00:19:00,660
do we call it semi-synchronous I've heard this term as well semi-synchronous

327
00:19:02,140 --> 00:19:05,520
Michael: yeah yeah I actually yeah I like I kind of like it.

328
00:19:05,740 --> 00:19:06,880
It does imply.

329
00:19:07,120 --> 00:19:08,520
Nikolay: Yeah, but official boxes.

330
00:19:08,520 --> 00:19:11,320
Michael: It's confusing Semi-synchronous is confusing enough.

331
00:19:11,320 --> 00:19:13,980
That means I have to look it up, which is probably helpful.

332
00:19:14,140 --> 00:19:14,700
Nikolay: I don't

333
00:19:14,700 --> 00:19:16,200
Michael: assume I know what it means,

334
00:19:16,240 --> 00:19:16,740
Nikolay: right?

335
00:19:17,920 --> 00:19:21,020
Michael: But yeah, it doesn't fit into either category.

336
00:19:21,020 --> 00:19:23,040
Like it's not synchronous and it's not asynchronous.

337
00:19:23,400 --> 00:19:26,760
Like it's, yeah, I don't like it though.

338
00:19:26,760 --> 00:19:29,580
I don't feel comfortable with it because what's it saving us?

339
00:19:29,580 --> 00:19:35,420
Like what's the benefit of, like is the idea that if both crash

340
00:19:35,420 --> 00:19:36,260
at the same time...

341
00:19:36,260 --> 00:19:38,800
Nikolay: Yeah, let's, before we talk about it, let's finish with

342
00:19:38,800 --> 00:19:39,300
terminology.

343
00:19:39,640 --> 00:19:43,340
Maybe, maybe, I'm looking up right now the word, the meaning

344
00:19:43,340 --> 00:19:46,400
of word quorum, maybe it's not that bad because in some cases

345
00:19:46,400 --> 00:19:49,200
we can say, okay, 25% is our quorum, right?

346
00:19:49,200 --> 00:19:50,560
We can define these rules.

347
00:19:50,940 --> 00:19:55,520
So, like, expectation that it should be more than 50, maybe it's

348
00:19:55,520 --> 00:19:57,220
like false expectation.

349
00:19:58,940 --> 00:20:01,600
Yeah, anyway, but you cannot say percentage.

350
00:20:02,040 --> 00:20:05,940
You say like, like any 1 or 2, right?

351
00:20:07,100 --> 00:20:12,620
Configuring Postgres,
synchronous_standby_names, you can say

352
00:20:12,620 --> 00:20:19,460
any word and then say, You can
have some kind of, it's an interesting

353
00:20:19,460 --> 00:20:20,900
syntax, you saw it, right?

354
00:20:21,260 --> 00:20:24,980
Michael: So- Yeah, I also saw you
could do, is it like priority?

355
00:20:25,440 --> 00:20:26,340
Nikolay: Yeah, yeah, yeah.

356
00:20:26,840 --> 00:20:30,520
Yeah, so yeah, there are some interesting
things there.

357
00:20:30,780 --> 00:20:34,400
But anyway, idea is they work together.

358
00:20:35,200 --> 00:20:38,500
synchronous_standby_names and synchronous_commit
in this case.

359
00:20:39,340 --> 00:20:43,260
And if synchronous_standby_names
is not empty, so there is this

360
00:20:43,260 --> 00:20:44,120
magic happening.

361
00:20:44,200 --> 00:20:48,980
If it's just a Host name, purely
synchronous replication, right?

362
00:20:49,240 --> 00:20:53,400
But also purely synchronous, there
are flavors we will discuss

363
00:20:53,880 --> 00:20:56,400
based on this synchronous_commit
setting.

364
00:20:56,720 --> 00:21:00,920
If there is like expression is
there, like any or like first,

365
00:21:00,920 --> 00:21:03,660
blah, blah, blah, then it's already
more complicated.

366
00:21:04,400 --> 00:21:05,420
But it's interesting.

367
00:21:06,660 --> 00:21:10,800
And official documentation mentions,
it uses the term quorum

368
00:21:10,800 --> 00:21:11,740
commit, right?

369
00:21:13,480 --> 00:21:17,040
Michael: I saw it in a Crunchy
Data blog post.

370
00:21:17,040 --> 00:21:19,180
I didn't actually see it in the
documentation.

371
00:21:20,740 --> 00:21:21,840
Nikolay: Okay, doesn't matter.

372
00:21:22,580 --> 00:21:23,740
Let's talk about flavors.

373
00:21:23,740 --> 00:21:30,640
Let's consider we have only for
simplicity just 1 host name mentioned

374
00:21:30,700 --> 00:21:34,400
in synchronous_standby_names, No
expressions, just 1.

375
00:21:34,740 --> 00:21:41,180
And if we don't change synchronous_commit
setting, default is

376
00:21:41,180 --> 00:21:43,560
on, here the meaning changes.

377
00:21:44,620 --> 00:21:45,120
Right?

378
00:21:45,700 --> 00:21:47,760
Which like this makes me...

379
00:21:49,520 --> 00:21:51,000
I started with this, right?

380
00:21:51,000 --> 00:21:52,220
Like overloaded.

381
00:21:52,660 --> 00:21:57,680
The meaning changes and what does
mean it means that we need

382
00:21:58,380 --> 00:22:04,260
to to wait until what like let's
let's see

383
00:22:04,260 --> 00:22:06,820
Michael: so yeah I think it's worth
thinking about them in terms,

384
00:22:06,820 --> 00:22:09,100
I like to think of them like progressively.

385
00:22:09,820 --> 00:22:14,580
I think the order goes, remote
apply is the strongest.

386
00:22:15,660 --> 00:22:21,380
So remote apply will wait until
it's not only been written to

387
00:22:21,380 --> 00:22:23,280
the write-ahead log and

388
00:22:23,520 --> 00:22:25,580
Nikolay: flush to disk but also

389
00:22:25,760 --> 00:22:32,520
Michael: on the but but available
to to queries so read queries

390
00:22:32,520 --> 00:22:35,280
on the on the standby will be able
to...

391
00:22:35,280 --> 00:22:39,480
Nikolay: And here I think, okay,
apply here but why replay there?

392
00:22:39,480 --> 00:22:42,940
In pg_stat_replication it's called
replay LSN, right?

393
00:22:44,440 --> 00:22:45,340
Apply or replay?

394
00:22:45,340 --> 00:22:47,980
Apply or replay, we consider the
same thing, right?

395
00:22:48,840 --> 00:22:49,340
Yeah.

396
00:22:49,340 --> 00:22:49,840
Right.

397
00:22:50,380 --> 00:22:50,880
So

398
00:22:51,660 --> 00:22:52,620
Michael: that's the strongest.

399
00:22:52,660 --> 00:22:57,080
I think that's the highest level
we can set, but it has additional

400
00:22:57,920 --> 00:22:59,160
latency, obviously.

401
00:23:00,160 --> 00:23:05,100
I think on is the next strongest.

402
00:23:06,140 --> 00:23:11,880
I think we wait for flush but not for, what did you want to call

403
00:23:11,880 --> 00:23:12,040
it?

404
00:23:12,040 --> 00:23:12,540
Replay.

405
00:23:12,980 --> 00:23:14,080
Nikolay: Replay or apply.

406
00:23:16,440 --> 00:23:21,640
Michael: And then there's 1 level below on which is the right

407
00:23:21,700 --> 00:23:26,700
Nikolay: there is no remote receive there is replay right

408
00:23:26,820 --> 00:23:29,240
Michael: yeah which but I think receive is this they're saying

409
00:23:29,240 --> 00:23:30,540
it's the same as right

410
00:23:31,620 --> 00:23:36,340
Nikolay: mmm WAL data received over transmitted over network

411
00:23:36,340 --> 00:23:42,480
received, but not yet flushed to file, but received, right?

412
00:23:42,740 --> 00:23:43,240
Yeah.

413
00:23:44,660 --> 00:23:46,360
Write and receive are the same.

414
00:23:46,640 --> 00:23:49,920
There is confusion because in different parts of Postgres here

415
00:23:49,920 --> 00:23:53,160
we have different words, different verbs used, right?

416
00:23:53,720 --> 00:23:54,640
But I agree, yeah.

417
00:23:54,640 --> 00:23:59,000
So there is a write, there is flush, and there is apply phases.

418
00:24:00,060 --> 00:24:03,040
And there are 3 steps here.

419
00:24:03,340 --> 00:24:09,180
And there is no remote flush, which I think, okay, overloading

420
00:24:09,360 --> 00:24:13,340
happens, but at least it would be good to have remote flush and

421
00:24:13,340 --> 00:24:18,040
say, okay, on is acting like flush, but there is no remote flush.

422
00:24:18,060 --> 00:24:24,220
There is only remote write and remote apply and instead of flush

423
00:24:24,220 --> 00:24:33,280
we have on saving on number of words right supported

424
00:24:33,440 --> 00:24:36,080
Michael: number of settings yeah but they did add a local which

425
00:24:36,080 --> 00:24:39,340
I think according to the documentation, they're saying is only

426
00:24:39,340 --> 00:24:40,020
for completeness.

427
00:24:40,200 --> 00:24:41,856
So it's interesting that they did add 1.

428
00:24:41,856 --> 00:24:43,020
Nikolay: It's not only for completeness.

429
00:24:43,040 --> 00:24:46,280
And we will go there in a few minutes.

430
00:24:47,780 --> 00:24:52,920
So 1 means remote flush, medium setting, right?

431
00:24:53,320 --> 00:24:54,140
WAL is written.

432
00:24:54,140 --> 00:24:54,640
Yes.

433
00:24:55,240 --> 00:24:57,400
Like it's happening on the primary.

434
00:24:57,400 --> 00:25:02,300
In primary, at commit time, it's flushed to disk.

435
00:25:02,300 --> 00:25:06,560
We know that in the case of crash, it's already, this whole data

436
00:25:06,560 --> 00:25:13,180
is already on disk, but what will happen with data pages during

437
00:25:13,180 --> 00:25:15,620
recovery, they will need to be adjusted, right?

438
00:25:15,620 --> 00:25:18,960
So, like, it's not applied yet.

439
00:25:19,660 --> 00:25:20,160
Right.

440
00:25:20,280 --> 00:25:21,240
Good, good, good.

441
00:25:21,240 --> 00:25:23,100
So, how do you feel about this?

442
00:25:23,100 --> 00:25:31,440
It's like synchronous replication, Which means that if we lose

443
00:25:31,440 --> 00:25:38,940
the primary, the new primary will have everything.

444
00:25:39,480 --> 00:25:40,620
Which is great, right?

445
00:25:41,040 --> 00:25:44,760
It will have everything and it will be needed to be applied,

446
00:25:45,020 --> 00:25:45,780
so recovery.

447
00:25:46,720 --> 00:25:48,180
But we don't lose anything.

448
00:25:48,340 --> 00:25:48,980
It's great.

449
00:25:49,900 --> 00:25:50,380
Agreed?

450
00:25:50,380 --> 00:25:54,520
Michael: Yeah, with a little bit of latency cost on each, right?

451
00:25:54,680 --> 00:25:55,420
Nikolay: Little bit.

452
00:25:55,920 --> 00:25:56,700
Michael: Or a lot.

453
00:25:56,760 --> 00:25:59,720
Nikolay: Yeah, this latency depends
on the network as well, a

454
00:25:59,720 --> 00:26:00,130
lot.

455
00:26:00,130 --> 00:26:00,540
Michael: Of

456
00:26:00,540 --> 00:26:01,040
Nikolay: course.

457
00:26:01,060 --> 00:26:05,960
And usually we prefer to move standbys
to different availability

458
00:26:06,100 --> 00:26:06,600
zones.

459
00:26:07,660 --> 00:26:12,180
And this increases latency, of
course, and this means that if

460
00:26:12,180 --> 00:26:16,840
we keep this as is, like synchronous_commit
on and synchronous_standby_names

461
00:26:16,840 --> 00:26:17,880
have...

462
00:26:19,660 --> 00:26:25,340
This setting has 1 host name, It
means that we will have a significant

463
00:26:25,440 --> 00:26:31,400
latency overhead This is why it's
hard to use in heavily loaded

464
00:26:31,400 --> 00:26:33,540
projects, OLTP-like projects.

465
00:26:34,120 --> 00:26:36,100
But it's quite strict mode, right?

466
00:26:37,120 --> 00:26:40,640
Michael: Yeah, and you can have
an HA setup that's within the

467
00:26:40,640 --> 00:26:43,740
same availability zone, or at least
the same region, I'm guessing,

468
00:26:43,740 --> 00:26:45,420
would massively reduce that.

469
00:26:46,020 --> 00:26:47,440
Nikolay: Well, I wouldn't put.

470
00:26:47,580 --> 00:26:51,000
So for me, 1 availability zone
can disappear.

471
00:26:51,760 --> 00:26:56,040
If you think about HA, we need
to have a replica in a different

472
00:26:56,040 --> 00:26:57,020
availability zone.

473
00:26:57,900 --> 00:27:00,600
Which, of course, there is trade-off
here.

474
00:27:00,720 --> 00:27:01,680
We need to wait.

475
00:27:02,380 --> 00:27:06,260
Wait, like, what is here, what
is here, and then what's worse,

476
00:27:06,260 --> 00:27:07,500
what's best for us.

477
00:27:08,440 --> 00:27:12,720
And we move, for me, like HA means
replica should be in different

478
00:27:12,720 --> 00:27:13,220
AZ.

479
00:27:14,020 --> 00:27:18,980
It's in different AZ, means like
commit in this strict mode,

480
00:27:19,700 --> 00:27:22,580
remote flush or on, there's no
remote flush.

481
00:27:23,160 --> 00:27:28,120
It means we will need to wait both
synchronization with disk

482
00:27:28,140 --> 00:27:29,840
on the primary and there.

483
00:27:30,220 --> 00:27:33,780
And network will add overhead here.

484
00:27:34,240 --> 00:27:38,500
And I expect we won't be able to
scale these writes very well.

485
00:27:39,640 --> 00:27:43,520
So if it's like thousands of writes
per second, it will be noticeable

486
00:27:44,080 --> 00:27:44,580
issue.

487
00:27:45,580 --> 00:27:49,580
That's why in most cases I see
people go to...

488
00:27:49,840 --> 00:27:52,700
Well, by the way, remote apply,
interesting.

489
00:27:52,960 --> 00:27:56,420
It's even worse, like we need to
apply this.

490
00:27:57,740 --> 00:28:02,120
It's even worse and It kills performance
even more, right?

491
00:28:03,060 --> 00:28:04,620
What do you think about this mode?

492
00:28:04,700 --> 00:28:05,640
It's like extreme.

493
00:28:07,660 --> 00:28:08,880
Michael: I actually lost you.

494
00:28:08,880 --> 00:28:10,840
I think our network connection
is not great.

495
00:28:10,840 --> 00:28:14,340
Nikolay: Okay, when I discussed
network issues, we had network

496
00:28:14,340 --> 00:28:14,840
issues.

497
00:28:15,180 --> 00:28:19,820
So I moved us to consideration
of remote apply.

498
00:28:20,220 --> 00:28:22,540
Michael: Well, yeah, so actually
question here then.

499
00:28:22,800 --> 00:28:27,180
This is no longer, I don't think,
about risk of data loss.

500
00:28:27,180 --> 00:28:30,080
Because if we flushed, then we're
good.

501
00:28:30,480 --> 00:28:33,460
So this is about, so is this for
read replicas?

502
00:28:33,660 --> 00:28:35,940
This only makes sense if we're
sending read traffic.

503
00:28:36,020 --> 00:28:36,720
Okay, great.

504
00:28:37,280 --> 00:28:41,260
Nikolay: So this is good in terms
of how data is visible there.

505
00:28:41,280 --> 00:28:46,280
We know by default on, it's acting
like remove flush, which again

506
00:28:46,280 --> 00:28:47,100
doesn't exist.

507
00:28:48,480 --> 00:28:53,080
Changes are not applied, so any
reading session which can be

508
00:28:53,080 --> 00:28:57,380
possible, the only possible option
we can have on the SELECTs,

509
00:28:57,380 --> 00:28:59,980
like reads on standby.

510
00:29:00,640 --> 00:29:04,100
In this case it won't see this
data yet.

511
00:29:06,580 --> 00:29:07,720
And the-

512
00:29:07,720 --> 00:29:10,760
Michael: Unless we're using remote
apply, in which case it would

513
00:29:10,760 --> 00:29:11,640
always be there.

514
00:29:11,640 --> 00:29:14,020
Nikolay: It will be there right
after commit.

515
00:29:14,060 --> 00:29:14,840
It's visible.

516
00:29:15,480 --> 00:29:16,020
It's great.

517
00:29:16,020 --> 00:29:21,440
So the delay between it's written
and visible becomes 0.

518
00:29:21,500 --> 00:29:23,700
Once it's committed, it's immediately
visible.

519
00:29:23,800 --> 00:29:24,300
Great.

520
00:29:24,520 --> 00:29:25,020
Yeah.

521
00:29:25,320 --> 00:29:28,920
And when do we need this?

522
00:29:28,980 --> 00:29:32,960
Question is, when do we need, I
don't know.

523
00:29:33,100 --> 00:29:34,840
I don't work with such products.

524
00:29:35,580 --> 00:29:35,940
So I'm

525
00:29:35,940 --> 00:29:41,320
Michael: right because by the point
you need read replicas, the

526
00:29:41,320 --> 00:29:44,900
point of having read replicas is
because you have so much read

527
00:29:44,900 --> 00:29:48,960
traffic that your primary can't
serve it all, right?

528
00:29:48,960 --> 00:29:52,960
So we're talking about extremely
high, at least extremely high

529
00:29:52,960 --> 00:29:53,320
reads.

530
00:29:53,320 --> 00:29:58,520
Maybe cases where we have a ton
more reads than writes, and write

531
00:29:58,520 --> 00:30:01,580
latency is not that, maybe social
media again.

532
00:30:01,580 --> 00:30:04,840
Nikolay: Yeah, In this case, this
replica can be in the same

533
00:30:04,840 --> 00:30:07,800
availability zone because it's
not for HA purposes, right?

534
00:30:07,800 --> 00:30:10,440
It's just a redistributor needs
maybe.

535
00:30:12,040 --> 00:30:13,940
I honestly don't know.

536
00:30:13,940 --> 00:30:19,580
We usually, we already have mechanisms
everywhere implemented

537
00:30:19,940 --> 00:30:23,240
to deal with lags of replication.

538
00:30:24,020 --> 00:30:28,320
To stick to the primary write after
writes and in the same session

539
00:30:28,320 --> 00:30:34,200
of user for some time or like these
kinds of things right so

540
00:30:34,200 --> 00:30:37,580
Michael: yeah you mentioned sticky
reads before yeah

541
00:30:37,580 --> 00:30:43,740
Nikolay: in this case to slow down
commits for the sake of this

542
00:30:43,740 --> 00:30:48,000
like reduction of this delay of
this leg I don't know I would

543
00:30:48,000 --> 00:30:52,620
prefer to deal with small lags,
but don't slow down commits.

544
00:30:53,100 --> 00:30:53,600
Yeah.

545
00:30:54,060 --> 00:30:54,240
So

546
00:30:54,240 --> 00:30:57,540
Michael: that's, I guess that's
remote apply and the default

547
00:30:57,540 --> 00:30:58,220
is on.

548
00:30:58,780 --> 00:31:02,060
Do you ever, Do you see use cases
for remote write?

549
00:31:02,800 --> 00:31:04,500
Nikolay: Remote write is the most
popular.

550
00:31:05,500 --> 00:31:06,240
Michael: Oh really?

551
00:31:06,420 --> 00:31:06,900
Interesting.

552
00:31:06,900 --> 00:31:10,460
Nikolay: Of course, most setups
where synchronous binaries is

553
00:31:10,460 --> 00:31:14,280
non-empty, I see they use remote
apply most of it.

554
00:31:15,520 --> 00:31:17,420
Remote write, remote write.

555
00:31:17,420 --> 00:31:18,740
Michael: Right, that makes more
sense.

556
00:31:18,820 --> 00:31:22,780
Nikolay: So the list like complete,
right?

557
00:31:23,000 --> 00:31:26,480
So, and here we have any issues
because this is super popular

558
00:31:26,480 --> 00:31:26,960
option.

559
00:31:26,960 --> 00:31:31,020
And, oh, by the way, I wanted to
mention that yesterday on Hacker

560
00:31:31,020 --> 00:31:35,780
News, it was the article from Aphyr,
right?

561
00:31:37,280 --> 00:31:37,780
Jepson.

562
00:31:37,800 --> 00:31:38,540
Yeah, Jepson.

563
00:31:39,480 --> 00:31:47,360
It was discussing RDS multi-AZ
clusters, which interesting, like

564
00:31:47,740 --> 00:31:50,800
it's closer to our topic today,
but to understand the article,

565
00:31:50,800 --> 00:31:51,660
you need time.

566
00:31:54,860 --> 00:32:00,800
So we, great, we have remote write
as being very popular.

567
00:32:00,800 --> 00:32:01,740
What does it mean?

568
00:32:02,080 --> 00:32:07,400
The standby node received all data,
but hasn't flushed it yet

569
00:32:07,420 --> 00:32:08,220
to the disk.

570
00:32:08,940 --> 00:32:16,920
It means that if all nodes suddenly
shut down, Only primary will

571
00:32:16,920 --> 00:32:18,580
have this data.

572
00:32:19,300 --> 00:32:23,860
After commit, we know primary,
if commit happens, primary flushed

573
00:32:23,860 --> 00:32:25,140
it, right?

574
00:32:25,940 --> 00:32:32,940
We don't have settings which say,
which let us control the behavior

575
00:32:33,240 --> 00:32:39,880
of commit on standby nodes, but
on the primary, it's not flushed.

576
00:32:39,880 --> 00:32:45,960
Like if we have remote write on
or remote apply, it means in

577
00:32:45,960 --> 00:32:52,840
any of these cases, disk on the
primary has all data from this

578
00:32:52,840 --> 00:32:53,900
transaction, right?

579
00:32:54,720 --> 00:32:59,320
So it's like it's already, it's
not off at all already.

580
00:32:59,600 --> 00:33:00,520
It's not off.

581
00:33:01,080 --> 00:33:04,900
And imagine we all know this appeared
on the primary has this

582
00:33:04,900 --> 00:33:10,580
data because a standby since it's
remote right is not guaranteed

583
00:33:11,320 --> 00:33:15,040
this might miss this data and this
yeah this transaction might

584
00:33:15,040 --> 00:33:19,280
be lost And this is okay.

585
00:33:19,280 --> 00:33:19,740
Okay.

586
00:33:19,740 --> 00:33:26,100
But in case we lose the primary
completely, we lose data.

587
00:33:27,360 --> 00:33:27,660
Right?

588
00:33:27,660 --> 00:33:28,940
We lose data here.

589
00:33:29,280 --> 00:33:29,780
Yeah.

590
00:33:30,660 --> 00:33:34,100
But we lose data only if all nodes
are shut down.

591
00:33:34,540 --> 00:33:39,180
Usually when we use the primary,
standby nodes are not shut down.

592
00:33:41,000 --> 00:33:42,420
Unless something like...

593
00:33:43,080 --> 00:33:45,020
Yeah, yeah, yeah.

594
00:33:45,020 --> 00:33:48,640
So Let's say the loss of primary
is our like corner case, not

595
00:33:48,640 --> 00:33:50,040
corner, edge case.

596
00:33:51,040 --> 00:33:55,760
But loss of primary and standby
nodes are not lost but shut down,

597
00:33:55,760 --> 00:33:59,360
it's like kind of very very corner
case, so several edges like,

598
00:34:00,060 --> 00:34:01,940
So several problems simultaneously.

599
00:34:02,560 --> 00:34:04,540
So usually they're still running.

600
00:34:05,280 --> 00:34:09,680
That's why remote write is so popular
because it says, okay,

601
00:34:09,680 --> 00:34:10,760
we lost the primary.

602
00:34:10,760 --> 00:34:13,580
We need to perform auto failover
or manual failover, usually

603
00:34:13,580 --> 00:34:14,400
auto failover.

604
00:34:15,140 --> 00:34:20,520
And standby, 1 or 2, or how many
you configure, of those standby

605
00:34:20,740 --> 00:34:27,280
nodes, pair the setting, pair standby
commit equals remote, right?

606
00:34:27,520 --> 00:34:28,820
They have it in memory.

607
00:34:30,040 --> 00:34:32,380
And they're still running, so they
will flush it.

608
00:34:33,240 --> 00:34:34,460
So the risk is good here.

609
00:34:34,460 --> 00:34:36,400
I mean, risk is understood.

610
00:34:37,640 --> 00:34:40,820
Like loss of primary and all, everyone
is shut down suddenly.

611
00:34:41,140 --> 00:34:43,760
But it's so low, so we can accept
it.

612
00:34:44,680 --> 00:34:48,260
And say it still feels like
synchronous_commit to me because

613
00:34:48,600 --> 00:34:52,840
I doubt when we lose primary standby
node which received but

614
00:34:52,840 --> 00:34:56,860
not yet flushed this whole data,
it will be suddenly off.

615
00:34:57,340 --> 00:34:58,520
It will be running, right?

616
00:34:58,520 --> 00:35:00,560
And we don't lose this data.

617
00:35:01,340 --> 00:35:02,260
This is great.

618
00:35:03,400 --> 00:35:04,780
This is why it's so popular.

619
00:35:05,800 --> 00:35:07,700
But it's great.

620
00:35:08,940 --> 00:35:10,240
Except it's not.

621
00:35:11,630 --> 00:35:18,600
Because if auto failover needs
to happen, it's very rarely, like

622
00:35:18,600 --> 00:35:24,740
sometimes people allocate like
CrunchyBridge or RDS, not multi-AZ

623
00:35:25,080 --> 00:35:28,840
cluster, but multi-AZ standby.

624
00:35:29,680 --> 00:35:30,960
Now there are 2 options.

625
00:35:31,620 --> 00:35:35,800
And multi-AZ standby means there
is synchronous standby only

626
00:35:35,800 --> 00:35:37,000
serving HA purposes.

627
00:35:37,040 --> 00:35:39,700
You cannot send queries to it.

628
00:35:40,240 --> 00:35:43,520
Crunchy Bridge has the same, like,
not the same, not the same.

629
00:35:43,520 --> 00:35:46,680
They use Postgres replication to
solve this.

630
00:35:46,680 --> 00:35:49,940
Exactly, they configure synchronous_standby_names.

631
00:35:50,860 --> 00:35:58,380
And they set synchronous_commit
to very weird value, we will

632
00:35:58,380 --> 00:35:59,280
discuss it.

633
00:35:59,380 --> 00:36:03,040
But the idea is HA replica doesn't
receive traffic, it serves

634
00:36:03,040 --> 00:36:04,440
only for HA purposes.

635
00:36:05,280 --> 00:36:05,780
Great.

636
00:36:06,780 --> 00:36:11,060
But in many other cases, people
use like, they say this quorum

637
00:36:11,060 --> 00:36:17,280
commit, 1 or 2 additional nodes
must say data is received.

638
00:36:17,980 --> 00:36:18,960
Remote, right?

639
00:36:19,000 --> 00:36:23,200
And in this case, if we lose the
primary, auto-failover will

640
00:36:23,200 --> 00:36:24,340
happen to where?

641
00:36:24,340 --> 00:36:26,700
To 1 of these nodes which received,
right?

642
00:36:27,340 --> 00:36:33,280
But the problem is, For example,
if it's Patroni, Patroni cannot

643
00:36:33,280 --> 00:36:38,200
choose this node, because there
is no such capability to find

644
00:36:38,720 --> 00:36:39,220
which...

645
00:36:39,720 --> 00:36:42,840
Like we have, again, we have 5
replicas, for example, 5 standbys.

646
00:36:43,660 --> 00:36:48,680
We know 1 of them received latest
settings.

647
00:36:48,740 --> 00:36:49,640
How to find this?

648
00:36:49,640 --> 00:36:50,780
Michael: At least, yeah.

649
00:36:51,040 --> 00:36:53,600
Nikolay: There is no such capability,
doesn't exist.

650
00:36:54,780 --> 00:37:01,300
Because this pg_last_receive_lsn,

651
00:37:02,780 --> 00:37:08,900
But it has the wrong name because
it shows pg... it shows flush

652
00:37:08,900 --> 00:37:09,400
LSN.

653
00:37:10,200 --> 00:37:11,120
It's a bug.

654
00:37:11,120 --> 00:37:13,260
Oh, a bug or feature, I don't know.

655
00:37:13,680 --> 00:37:15,180
So this is a problem.

656
00:37:15,180 --> 00:37:20,280
And this is like if Patroni uses
that I think it uses that it

657
00:37:20,280 --> 00:37:24,780
chooses the node which received
like which flushed LSN with

658
00:37:24,780 --> 00:37:30,400
maximum value and this might be
different no Yeah

659
00:37:32,420 --> 00:37:36,420
Michael: So but so with that in
mind I guess it's then an acceptable

660
00:37:36,600 --> 00:37:38,600
risk, or that's why it has some
data loss.

661
00:37:38,600 --> 00:37:40,940
But why not use on in that case?

662
00:37:42,720 --> 00:37:44,280
Nikolay: Because it will slow down
commits.

663
00:37:45,540 --> 00:37:47,140
Michael: OK, that's, yes, the trade-off.

664
00:37:47,220 --> 00:37:51,920
Nikolay: We distribute our nodes
among various, multiple AZs,

665
00:37:52,280 --> 00:37:53,860
availability zones, right?

666
00:37:55,240 --> 00:37:58,760
And there is network overhead.

667
00:38:00,260 --> 00:38:03,420
Michael: Yeah, well, Interestingly,
I checked before coming on

668
00:38:03,420 --> 00:38:08,540
with our own Google Cloud SQL setup,
and I wondered what our

669
00:38:08,540 --> 00:38:11,100
setting was, and it was on.

670
00:38:11,740 --> 00:38:13,380
So either they haven't...

671
00:38:14,600 --> 00:38:16,220
I'm pretty sure...

672
00:38:16,220 --> 00:38:19,280
But yeah, it's interesting that
they could have chosen to set

673
00:38:19,280 --> 00:38:20,040
it to remote.

674
00:38:20,580 --> 00:38:22,620
Nikolay: And synchronous_standby_names
is non-empty?

675
00:38:23,740 --> 00:38:24,640
Michael: I didn't check.

676
00:38:24,640 --> 00:38:25,660
It's a good question.

677
00:38:26,000 --> 00:38:29,440
Nikolay: If it's empty, you have
pure asynchronous application.

678
00:38:29,720 --> 00:38:30,824
This is it.

679
00:38:30,824 --> 00:38:31,200
Like, this is it.

680
00:38:31,200 --> 00:38:36,880
Like, I Honestly, it concerns me
a lot how overloaded values

681
00:38:36,980 --> 00:38:38,080
and names here.

682
00:38:38,940 --> 00:38:41,740
On means different things depending
on different settings.

683
00:38:42,120 --> 00:38:42,620
Why?

684
00:38:44,380 --> 00:38:49,300
Maybe We are trying to save on
the number of configuration settings,

685
00:38:49,940 --> 00:38:53,500
GUC names, not to expand it too
much.

686
00:38:53,500 --> 00:38:59,340
Okay, 300, but let's not to expand
it to 500, right?

687
00:38:59,340 --> 00:39:04,180
But this causes pain in brains,
like people are using this.

688
00:39:06,360 --> 00:39:08,420
So if it's on, okay.

689
00:39:09,520 --> 00:39:09,960
Is it?

690
00:39:09,960 --> 00:39:10,960
Michael: No, you're right.

691
00:39:10,960 --> 00:39:11,780
Nikolay: It's asynchronous.

692
00:39:12,940 --> 00:39:14,520
Michael: No synchronous_standby_names

693
00:39:14,760 --> 00:39:15,060
Nikolay: Yeah, yeah.

694
00:39:15,060 --> 00:39:17,660
So it waits for flush only on the
primary.

695
00:39:18,480 --> 00:39:21,920
Michael: Yeah, so that's basically
just local.

696
00:39:22,060 --> 00:39:23,460
Nikolay: Yeah, yeah, yeah.

697
00:39:23,900 --> 00:39:24,400
Good.

698
00:39:25,400 --> 00:39:30,460
Yeah, so there's this problem,
it's unsolved.

699
00:39:30,460 --> 00:39:31,900
It should be solved in Postgres.

700
00:39:32,920 --> 00:39:37,260
It should be solved in Postgres
and I think it's not super difficult

701
00:39:37,260 --> 00:39:41,400
to solve so we need a function
which will show write LSN or receive LSN.

702
00:39:45,140 --> 00:39:49,120
By the way, receive and write may
be also a little bit different.

703
00:39:49,260 --> 00:39:54,600
You receive but okay, it doesn't
matter Because while the receiver

704
00:39:54,600 --> 00:39:56,140
is a receiver, right?

705
00:39:56,960 --> 00:40:02,060
Anyway, so a couple of more things,
deeper, I wanted to mention.

706
00:40:03,340 --> 00:40:09,480
It's deeper things, but 1 of the
things is even if we have a

707
00:40:09,480 --> 00:40:15,960
remote write, there is issue in
the code that causes flushing

708
00:40:16,620 --> 00:40:17,120
periodically.

709
00:40:18,480 --> 00:40:22,860
So this WAL, like we have remote
write, but sometimes we wait

710
00:40:22,860 --> 00:40:23,900
until it's flushed.

711
00:40:25,580 --> 00:40:29,680
And this is also like this, I guess
this issue is known.

712
00:40:30,040 --> 00:40:33,740
It's revealed itself only under
very heavy loads.

713
00:40:34,740 --> 00:40:37,860
And yeah, it also could be improved
maybe, right?

714
00:40:37,960 --> 00:40:40,680
Because if, so remote, right?

715
00:40:40,680 --> 00:40:42,780
It's not purely remote, right?

716
00:40:43,040 --> 00:40:44,440
It's remote, right, right, right.

717
00:40:44,440 --> 00:40:50,280
But at some point it's flush and
all Transactions are currently

718
00:40:50,360 --> 00:40:54,180
committing, being committed, they
wait additional flush.

719
00:40:54,600 --> 00:40:57,560
And then it's again, pure, again,
write, write, write, and then

720
00:40:57,560 --> 00:40:58,400
again, flush.

721
00:40:58,840 --> 00:41:05,020
And it can be seen in the code
that there are things there, F

722
00:41:05,020 --> 00:41:06,140
data sync, right?

723
00:41:06,980 --> 00:41:08,040
Michael: Roughly, yeah.

724
00:41:08,600 --> 00:41:13,980
I mean, is this also the wal_writer_delay
delay or a different kind

725
00:41:13,980 --> 00:41:14,560
Nikolay: of delay?

726
00:41:15,240 --> 00:41:16,820
No, no, no, It's not documented.

727
00:41:17,900 --> 00:41:18,920
Michael: Okay, interesting.

728
00:41:20,340 --> 00:41:25,940
Nikolay: So, I mean, if you use
remote, right, there is like

729
00:41:26,040 --> 00:41:28,580
behavior which is not purely remote,
right?

730
00:41:28,580 --> 00:41:32,600
So it still involves disk sometimes,
like periodically.

731
00:41:33,340 --> 00:41:37,660
I guess some buffer is full and
needs to be written.

732
00:41:38,720 --> 00:41:41,420
But this causes delays of commits.

733
00:41:43,380 --> 00:41:43,780
Michael: So.

734
00:41:43,780 --> 00:41:44,360
Makes sense.

735
00:41:44,600 --> 00:41:48,080
Nikolay: Yeah, And another thing
is probably off topic, but CrunchyBridge,

736
00:41:48,560 --> 00:41:50,340
let's discuss what we observed.

737
00:41:50,680 --> 00:41:54,220
We just created a fresh cluster
and saw a few things.

738
00:41:55,020 --> 00:41:59,720
1 of them is synchronous_standby_names
is filled with, I created

739
00:41:59,720 --> 00:42:00,600
HA cluster.

740
00:42:01,420 --> 00:42:03,640
Yeah, so HA cluster was all good.

741
00:42:03,820 --> 00:42:08,960
And synchronous_standby_names has
this host, which is HA replica.

742
00:42:09,660 --> 00:42:11,100
Like it's not visible to us.

743
00:42:11,100 --> 00:42:13,560
We cannot send queries to it.

744
00:42:15,120 --> 00:42:18,340
And synchronous_commit is set to
local.

745
00:42:20,740 --> 00:42:24,180
Michael: Yeah, so the first thing
you mentioned, it being non-empty

746
00:42:24,220 --> 00:42:26,640
means they're using Postgres replication,
right?

747
00:42:26,820 --> 00:42:29,020
What's the significance of that
being non-empty?

748
00:42:30,060 --> 00:42:32,420
Nikolay: If it's not empty, it's
supposed to be...

749
00:42:32,420 --> 00:42:36,820
If a synchronous_commit was default
on, it would mean we have

750
00:42:36,820 --> 00:42:37,940
a synchronous replication.

751
00:42:39,720 --> 00:42:43,340
Flush, as you said, like on means remote flush in this...

752
00:42:45,340 --> 00:42:47,640
Non-existing remote flash, it means, right?

753
00:42:49,020 --> 00:42:53,320
But synchronous_commit is set to local, which means we don't

754
00:42:54,400 --> 00:42:58,220
see the state of what's happening on standby.

755
00:42:58,400 --> 00:43:03,120
So like commit only depends, the commit behavior depends only

756
00:43:03,120 --> 00:43:04,080
the primary.

757
00:43:05,080 --> 00:43:06,260
Michael: Like my setup, right?

758
00:43:06,260 --> 00:43:10,520
Like on Google CloudSQL for me, even though it's on, it's kind

759
00:43:10,520 --> 00:43:12,540
of like local because they're not using.

760
00:43:12,560 --> 00:43:14,440
Nikolay: Right, but they still have the replica.

761
00:43:15,320 --> 00:43:16,160
Michael: Yeah, Yeah, yeah.

762
00:43:16,340 --> 00:43:17,160
Nikolay: Strange, right?

763
00:43:17,160 --> 00:43:20,940
Michael: But local, local's confusing because I don't, yeah,

764
00:43:20,940 --> 00:43:24,880
I would have, as you said, if, I would have expected it to be

765
00:43:24,880 --> 00:43:26,680
on, or as you say, the more popular one.

766
00:43:26,680 --> 00:43:30,060
Nikolay: If it would be on, if it was on, this replica would

767
00:43:30,060 --> 00:43:30,760
be synchronous.

768
00:43:31,400 --> 00:43:32,660
If it's local,

769
00:43:34,400 --> 00:43:35,520
Michael: I guess it's unsynchronous?

770
00:43:36,020 --> 00:43:36,520
Yeah.

771
00:43:37,080 --> 00:43:38,820
Or asynchronous, yeah.

772
00:43:39,320 --> 00:43:40,460
Nikolay: Asynchronous, right.

773
00:43:40,760 --> 00:43:44,000
Although it's mentioned in synchronous_standby_names.

774
00:43:45,060 --> 00:43:46,060
Michael: Yeah, good point.

775
00:43:48,520 --> 00:43:52,480
What would you prefer it to be remote right, I guess, rather

776
00:43:52,480 --> 00:43:52,900
than on?

777
00:43:52,900 --> 00:43:53,560
Nikolay: No, no, no.

778
00:43:53,560 --> 00:43:57,080
I think they are not, like, I'm not sure it's a mistake.

779
00:43:59,080 --> 00:44:02,400
Michael: Well, it's not the default, right so it's definitely

780
00:44:02,400 --> 00:44:02,900
deliberate.

781
00:44:05,660 --> 00:44:09,620
Nikolay: There are some thoughts about this I think and I'm trying

782
00:44:09,620 --> 00:44:15,520
to guess what because obviously making it synchronous would cause

783
00:44:16,260 --> 00:44:21,980
slow commits right yeah performance would be not great okay we

784
00:44:21,980 --> 00:44:23,260
go to local but why

785
00:44:23,260 --> 00:44:24,280
Michael: do people are doing

786
00:44:24,280 --> 00:44:27,900
Nikolay: yeah we keep it local, so this chair replica becomes

787
00:44:27,920 --> 00:44:28,420
asynchronous.

788
00:44:30,540 --> 00:44:31,040
Huh.

789
00:44:31,860 --> 00:44:32,180
Do you

790
00:44:32,180 --> 00:44:33,740
Michael: think it's to look good in benchmarking?

791
00:44:34,900 --> 00:44:37,780
Nikolay: Why cannot we configure asynchronous replica in a normal

792
00:44:37,780 --> 00:44:38,280
way?

793
00:44:38,360 --> 00:44:41,260
Why should we put it to synchronous_standby_names, right?

794
00:44:41,320 --> 00:44:45,280
Because it could be regular asynchronous replica, like present

795
00:44:45,280 --> 00:44:51,800
in slot, but not available for read-only transactions, right?

796
00:44:52,420 --> 00:44:53,400
But it's there.

797
00:44:54,240 --> 00:44:58,200
And the only idea, it's not my idea, we discussed this situation

798
00:44:58,660 --> 00:45:06,660
in the team and idea came up into one great mind was probably they

799
00:45:07,520 --> 00:45:13,780
want WAL to be written by backends, not by WAL writer, because

800
00:45:14,600 --> 00:45:18,160
there's like this effect when you use synchronous binaries, it's

801
00:45:18,160 --> 00:45:20,400
not empty, backends are writing.

802
00:45:21,020 --> 00:45:22,900
And this can increase throughput.

803
00:45:25,080 --> 00:45:26,820
It's a guess, hypothesis.

804
00:45:28,080 --> 00:45:30,680
If Crunchy is listening to us,
and we know they're listening

805
00:45:30,680 --> 00:45:34,300
because they changed the random
page cost to 1.1, right?

806
00:45:34,300 --> 00:45:34,920
Michael: Oh yeah.

807
00:45:34,920 --> 00:45:36,560
Nikolay: Actually, it's great.

808
00:45:36,620 --> 00:45:40,320
I also wanted to thank them because
since then I stopped saying

809
00:45:40,320 --> 00:45:45,940
1 and I say 1.1 because Crunchy
benchmarked it and decided to

810
00:45:45,940 --> 00:45:48,080
use 1.1 for random page cost.

811
00:45:48,080 --> 00:45:48,420
Great.

812
00:45:48,420 --> 00:45:49,420
Thank you so much.

813
00:45:49,640 --> 00:45:54,220
But if they, if you guys are listening,
please let us know what's

814
00:45:54,220 --> 00:45:55,240
happening with settings.

815
00:45:55,240 --> 00:45:56,140
It's really interesting.

816
00:45:56,140 --> 00:45:58,860
Maybe it should be documented because
it's interesting.

817
00:45:59,270 --> 00:46:04,300
And so overall impression about
synchronous_commit, all those

818
00:46:04,380 --> 00:46:11,020
options and synchronous replication,
I honestly see more asynchronous

819
00:46:12,540 --> 00:46:13,540
setups purely.

820
00:46:15,300 --> 00:46:20,640
But I know serious guys and I see
clusters with synchronous like

821
00:46:20,640 --> 00:46:22,260
quorum commit configured.

822
00:46:23,040 --> 00:46:26,440
My impression still there are things
to be fixed and improved.

823
00:46:27,560 --> 00:46:29,880
This is what I observe.

824
00:46:30,020 --> 00:46:31,320
And 1 more of...

825
00:46:32,100 --> 00:46:33,340
Let's name it.

826
00:46:33,340 --> 00:46:39,520
When we created this cluster with
HA, with HA checkbox, HA replica

827
00:46:39,520 --> 00:46:40,480
was also created.

828
00:46:40,480 --> 00:46:42,540
We see it in synchronous_standby_names.

829
00:46:42,880 --> 00:46:47,360
And then we decided using our super
user, which Crunchy has,

830
00:46:47,360 --> 00:46:48,300
and it's great.

831
00:46:48,600 --> 00:46:52,220
I think it's the only managed service,
1 of maybe 2 actually,

832
00:46:52,400 --> 00:46:55,060
which provides super user access,
which is great.

833
00:46:55,200 --> 00:46:59,560
So we decided to change shared
preload libraries and restart.

834
00:47:00,520 --> 00:47:06,440
And there was an idea, like a question,
is this change propagated

835
00:47:06,660 --> 00:47:08,300
to a standby node or not?

836
00:47:08,560 --> 00:47:12,800
And what happened, Postgres restart,
not cluster restart, Postgres

837
00:47:12,800 --> 00:47:14,360
restart, cost fell over.

838
00:47:15,040 --> 00:47:16,220
This is not all right.

839
00:47:17,480 --> 00:47:18,540
This is not all right.

840
00:47:18,540 --> 00:47:24,180
Freshly created cluster and we
did it 3 times, 2 out of 3 times,

841
00:47:24,400 --> 00:47:25,380
failover happened.

842
00:47:26,140 --> 00:47:27,940
And another time it didn't happen.

843
00:47:28,080 --> 00:47:33,540
So I guess it's like some race
condition is present there, right?

844
00:47:34,180 --> 00:47:37,360
And we saw that change was not
propagated to a channel.

845
00:47:37,360 --> 00:47:38,940
So it's actually data loss.

846
00:47:38,940 --> 00:47:42,940
If I say, if I have super user,
it's my right, right?

847
00:47:42,980 --> 00:47:48,940
And I have alter system, it writes
to, we can see it in Pidge

848
00:47:48,940 --> 00:47:53,000
settings and it's not yet applied,
like painting restart is true,

849
00:47:53,000 --> 00:47:53,500
right?

850
00:47:54,120 --> 00:47:59,480
And then we have a lower and we
don't see it anymore.

851
00:48:01,220 --> 00:48:01,920
What is it?

852
00:48:01,920 --> 00:48:03,000
It's data loss.

853
00:48:04,620 --> 00:48:07,000
Michael: Or it's unsupported behavior.

854
00:48:08,040 --> 00:48:09,280
Nikolay: No, no, no, it's data loss.

855
00:48:09,280 --> 00:48:10,160
I wrote something.

856
00:48:11,980 --> 00:48:15,080
Using alter system, I wrote something and I lost it.

857
00:48:15,560 --> 00:48:19,400
Michael: So you can't have it both ways that like III see where

858
00:48:19,400 --> 00:48:23,100
you're coming from 100% But on 1 hand you want super user access

859
00:48:23,240 --> 00:48:26,420
and on the other hand You're gonna call it data loss if you change

860
00:48:26,420 --> 00:48:28,040
anything that doesn't get propagated

861
00:48:28,820 --> 00:48:32,420
Nikolay: Let me contradict any with myself like I I'm too harsh.

862
00:48:32,420 --> 00:48:34,120
I know like alter system.

863
00:48:34,120 --> 00:48:39,460
It's actually local, we can change settings on each standby separately,

864
00:48:39,520 --> 00:48:44,860
independently, so it's not guaranteed, it's not propagated by

865
00:48:44,860 --> 00:48:45,360
replication.

866
00:48:45,860 --> 00:48:49,440
I just expected maybe some automation exists which would propagate

867
00:48:49,440 --> 00:48:49,940
it.

868
00:48:50,600 --> 00:48:54,300
If this replica is a HA replica, maybe we should match settings.

869
00:48:55,440 --> 00:49:00,400
Because this setting required restart, but what happens to settings

870
00:49:00,560 --> 00:49:02,080
that don't require restart?

871
00:49:02,820 --> 00:49:05,900
If I change them, and then what?

872
00:49:05,900 --> 00:49:11,140
Like if I change them using alter system, and do I lose them?

873
00:49:11,140 --> 00:49:12,380
This is not good, right?

874
00:49:13,220 --> 00:49:13,720
Michael: Yeah.

875
00:49:14,500 --> 00:49:18,420
But equally, expectations of automations around this stuff is

876
00:49:18,420 --> 00:49:22,280
probably 1 that could be a nice excuse for other providers not

877
00:49:22,280 --> 00:49:23,620
to add CPUs to Access.

878
00:49:23,620 --> 00:49:24,440
So it's yeah.

879
00:49:25,440 --> 00:49:25,680
No, no, no.

880
00:49:25,680 --> 00:49:30,250
Nikolay: Well, I think it's no, no, no, no, no, no, no, no, no,

881
00:49:30,540 --> 00:49:31,040
no.

882
00:49:31,420 --> 00:49:33,840
Michael: I'm just saying be careful what you wish for.

883
00:49:33,940 --> 00:49:37,940
Nikolay: Well, it could happen, like I change something using

884
00:49:37,940 --> 00:49:40,240
UI and...

885
00:49:40,240 --> 00:49:42,880
Michael: No, because if you use the UI, they change it somewhere

886
00:49:42,880 --> 00:49:44,480
else and it does get propagated.

887
00:49:44,680 --> 00:49:45,180
Nikolay: Maybe.

888
00:49:45,300 --> 00:49:49,140
But I'm rooting for Crunch actually here.

889
00:49:49,740 --> 00:49:52,740
I think it can be solved, this problem.

890
00:49:52,900 --> 00:49:55,320
I don't want them to close super user access.

891
00:49:55,320 --> 00:49:57,540
It's a great feature, super great feature.

892
00:49:57,740 --> 00:49:59,100
I think everyone should be.

893
00:49:59,100 --> 00:50:06,660
Based on our last, previous week's topic, My position, RDS and

894
00:50:06,660 --> 00:50:09,880
others, they are taking too much from us.

895
00:50:10,920 --> 00:50:12,880
It stops feeling open source.

896
00:50:13,260 --> 00:50:15,060
Michael: We don't need to make this point again.

897
00:50:15,060 --> 00:50:15,860
Nikolay: Yeah, okay.

898
00:50:15,860 --> 00:50:18,000
Well, I'm going to repeat it many times, right?

899
00:50:18,000 --> 00:50:19,640
So it's not all right.

900
00:50:19,640 --> 00:50:23,320
So and Crunch are doing a great job here to providing, we are

901
00:50:23,320 --> 00:50:29,120
providing super user access and I think it can be solved, right?

902
00:50:29,340 --> 00:50:34,140
And 1 more thing, if you're listening, when I restart Postgres,

903
00:50:34,560 --> 00:50:36,340
I see restart cluster, right?

904
00:50:38,900 --> 00:50:41,320
Michael: I feel like I need to chop this bit up and send it to

905
00:50:41,320 --> 00:50:42,840
them as a support thing

906
00:50:42,980 --> 00:50:44,680
Nikolay: good yeah it was good

907
00:50:44,680 --> 00:50:45,140
Michael: nice 1

908
00:50:45,140 --> 00:50:45,740
Nikolay: thank you

909
00:50:45,740 --> 00:50:48,640
Michael: yeah thanks Thanks so much Nikolay and thanks for the

910
00:50:48,640 --> 00:50:51,180
request and yeah catch you next week.

911
00:50:51,420 --> 00:50:51,860
Nikolay: See you.

912
00:50:51,860 --> 00:50:52,080
Bye.