1
00:00:00,060 --> 00:00:02,800
Michael: Hello and welcome to Postgres
FM, a weekly show about

2
00:00:02,800 --> 00:00:04,080
all things PostgreSQL.

3
00:00:04,080 --> 00:00:07,240
I am Michael, founder of pgMustard,
and this is my co-host Nikolay,

4
00:00:07,240 --> 00:00:08,660
founder of Postgres.AI.

5
00:00:08,860 --> 00:00:11,100
Hey Nikolay, what are we talking
about today?

6
00:00:11,520 --> 00:00:15,040
Nikolay: Hi Michael, about upgrades
again, but it will be some

7
00:00:15,040 --> 00:00:15,900
small talk.

8
00:00:16,560 --> 00:00:18,980
Just minor upgrades, easy, right?

9
00:00:19,380 --> 00:00:21,660
Michael: Yeah, minor episode about
minor upgrades.

10
00:00:22,080 --> 00:00:24,880
Nikolay: Yeah, replace binaries,
restart, bye-bye.

11
00:00:24,940 --> 00:00:25,740
That's it.

12
00:00:25,840 --> 00:00:26,780
Episode done.

13
00:00:27,100 --> 00:00:29,120
This is what the documentation
says, right?

14
00:00:29,120 --> 00:00:31,200
Replace binaries, restart.

15
00:00:32,220 --> 00:00:33,020
Easy peasy.

16
00:00:34,400 --> 00:00:37,200
Michael: Yeah, so, well, this was
my choice, and I wanted to

17
00:00:37,200 --> 00:00:38,600
pick it for a few reasons.

18
00:00:38,600 --> 00:00:43,860
1 is that we just had new minor
releases announced a few days

19
00:00:43,860 --> 00:00:47,120
ago as of recording, about a week
ago as of the time this will

20
00:00:47,120 --> 00:00:47,800
go out.

21
00:00:47,800 --> 00:00:48,960
Nikolay: Planned one, right?

22
00:00:48,960 --> 00:00:49,940
Every 2 months.

23
00:00:50,740 --> 00:00:51,720
Yes, yes, yes.

24
00:00:52,120 --> 00:00:53,540
Not like urgent.

25
00:00:54,120 --> 00:00:54,940
Michael: Yes, exactly.

26
00:00:55,680 --> 00:00:58,480
So I thought it was a good time
to bring it up.

27
00:00:58,480 --> 00:01:00,600
I thought there were a couple of
interesting things in it.

28
00:01:00,600 --> 00:01:05,380
But also it was a good reminder
to me that we hadn't talked about

29
00:01:05,380 --> 00:01:05,980
in a while.

30
00:01:05,980 --> 00:01:08,880
I did look back and the last time
we spoke about upgrades, it

31
00:01:08,880 --> 00:01:10,020
was also my idea.

32
00:01:10,440 --> 00:01:13,320
And it was the time where I was
suggesting boring topic after

33
00:01:13,320 --> 00:01:14,100
boring topic.

34
00:01:14,100 --> 00:01:16,920
So thank you for not, Thank you
for stopping having a go at me

35
00:01:16,920 --> 00:01:18,380
when I bring up boring topics.

36
00:01:18,940 --> 00:01:22,800
Partly because I see a lot of people,
especially when they use

37
00:01:22,800 --> 00:01:27,280
managed service providers, on really
quite old versions.

38
00:01:28,180 --> 00:01:30,980
I don't mind so much if it's an
old major version.

39
00:01:31,100 --> 00:01:32,540
I can see there's reasons.

40
00:01:33,300 --> 00:01:36,160
I encourage them to upgrade, but
I do understand that there's

41
00:01:36,160 --> 00:01:36,660
reasons.

42
00:01:36,680 --> 00:01:40,580
But when they're on a supported
major version and lagging really

43
00:01:40,580 --> 00:01:44,800
far behind on minor versions, it's
difficult to justify.

44
00:01:45,260 --> 00:01:50,020
Nikolay: The guys on version 14.3,
for example, which was a bad

45
00:01:50,020 --> 00:01:50,520
one.

46
00:01:50,600 --> 00:01:51,400
Do you remember?

47
00:01:51,820 --> 00:01:52,410
Michael: Yeah, yeah.

48
00:01:52,410 --> 00:01:53,660
Nikolay: It was corruption.

49
00:01:54,020 --> 00:01:55,620
Reindex concurrently corruption.

50
00:01:56,120 --> 00:01:59,320
Michael: Last time we mentioned
a couple of things about the

51
00:01:59,320 --> 00:02:01,880
schedule, but I didn't have the
details and I've looked it up

52
00:02:01,880 --> 00:02:02,560
this time.

53
00:02:02,560 --> 00:02:06,400
And I think that's a good reminder
that there's normally a schedule

54
00:02:06,400 --> 00:02:08,240
for these minor releases.

55
00:02:08,240 --> 00:02:11,600
I keep calling them minor versions,
but the docs don't call them

56
00:02:11,600 --> 00:02:12,100
that.

57
00:02:12,280 --> 00:02:16,780
They're called minor releases in
the documentation, which I found

58
00:02:16,780 --> 00:02:17,280
interesting.

59
00:02:17,420 --> 00:02:19,300
Major versions and minor releases.

60
00:02:19,540 --> 00:02:23,180
Nikolay: And I think it's just
some inconsistency, small one.

61
00:02:23,640 --> 00:02:24,140
Michael: Maybe.

62
00:02:24,440 --> 00:02:25,260
It felt deliberate.

63
00:02:25,260 --> 00:02:29,180
Like it felt very like, anyway,
as you said, this recent one last

64
00:02:29,180 --> 00:02:34,840
week was a planned release and
they, in general, it says, unless

65
00:02:34,840 --> 00:02:39,940
otherwise stated on the second
Thursday of February May August

66
00:02:39,960 --> 00:02:40,680
and November

67
00:02:41,360 --> 00:02:45,540
Nikolay: yeah I was wrong every
3 months not 2 there is a wiki

68
00:02:45,540 --> 00:02:49,760
page where it's the table with
schedule when when my journey

69
00:02:49,760 --> 00:02:51,800
this has become end of life.

70
00:02:52,200 --> 00:02:57,960
And also this policy for planned
releases, but sometimes unplanned

71
00:02:58,080 --> 00:02:59,760
releases happen as well.

72
00:02:59,800 --> 00:03:02,240
I also saw somewhere statistics.

73
00:03:02,640 --> 00:03:06,420
Ah, it was on the WhyUpgrade, WhyUpgradeDepth.com,
we should mention

74
00:03:06,420 --> 00:03:10,980
it a few times in this episode
because it's a super convenient

75
00:03:11,020 --> 00:03:15,520
tool to see differences in better
form.

76
00:03:15,600 --> 00:03:19,600
You can find everything in the
release notes or just selecting

77
00:03:21,780 --> 00:03:27,100
some things in the system catalogs,
but it's much easier just

78
00:03:27,100 --> 00:03:33,440
to go to whyupgrade.depth.com and
then just see the differences

79
00:03:33,480 --> 00:03:37,440
choosing between a couple of minor
versions or also major versions,

80
00:03:37,440 --> 00:03:38,580
it's also supported.

81
00:03:39,600 --> 00:03:44,520
So yeah, long list and security
related stuff is highlighted.

82
00:03:46,240 --> 00:03:53,200
So there I saw that how many changes
happened in each major version

83
00:03:53,400 --> 00:03:56,180
and how many minor releases happened.

84
00:03:56,400 --> 00:04:02,120
And can you imagine, in 7.4, more
than 30 minor releases happened.

85
00:04:04,060 --> 00:04:04,980
It was 7.4.30.

86
00:04:08,680 --> 00:04:09,440
Oh wow.

87
00:04:09,440 --> 00:04:13,040
Yeah, it was popular, popular major
version, I guess.

88
00:04:13,260 --> 00:04:18,640
Michael: So what would expected
be about 20, 5 years with 4 updates

89
00:04:18,640 --> 00:04:20,940
per year, maybe 19 or 20?

90
00:04:21,020 --> 00:04:25,700
Nikolay: 94, 96, I think they were
like 25, maybe 24 releases,

91
00:04:25,920 --> 00:04:29,120
but now it's shorter, like 20 maybe.

92
00:04:29,240 --> 00:04:32,300
I don't remember details, like
right now.

93
00:04:32,520 --> 00:04:32,980
Yeah, yeah.

94
00:04:32,980 --> 00:04:35,380
So there are many minor versions.

95
00:04:35,380 --> 00:04:40,020
And sometimes we saw it happened,
like minor version planned

96
00:04:40,020 --> 00:04:43,820
released and then in a week or
2, new version released because

97
00:04:43,820 --> 00:04:45,960
some problem just identified.

98
00:04:47,520 --> 00:04:51,340
So minor version can bring not
only fixes but problems, because

99
00:04:51,340 --> 00:04:53,860
it's a regular thing in software
development.

100
00:04:55,160 --> 00:04:59,940
Despite all the measures to control
the quality and regression

101
00:05:00,020 --> 00:05:06,440
tests and performance testing,
everything, It still happens.

102
00:05:06,820 --> 00:05:12,760
So downgrades or fix it forward,
like, let's wait until the next

103
00:05:13,100 --> 00:05:13,940
minor release.

104
00:05:14,280 --> 00:05:15,460
It happens still.

105
00:05:15,900 --> 00:05:18,600
Michael: I was going to say, though,
I think it's been happening

106
00:05:18,600 --> 00:05:22,900
a lot less recently so I think
yeah only the only 1 I remember

107
00:05:22,900 --> 00:05:26,200
this is why I brought up the schedule
is because you mentioned

108
00:05:26,200 --> 00:05:32,580
14.3 and it was 14.4 was the last
unscheduled I actually don't

109
00:05:32,580 --> 00:05:33,580
know if they've got a name for
it.

110
00:05:33,580 --> 00:05:35,280
I guess it is just a minor release.

111
00:05:35,280 --> 00:05:37,740
But it was not on the schedule
we mentioned.

112
00:05:37,800 --> 00:05:39,220
It came out in June.

113
00:05:39,520 --> 00:05:42,740
So a month after, about a month
after the...

114
00:05:42,740 --> 00:05:43,580
Nikolay: I think there are more.

115
00:05:43,580 --> 00:05:44,940
I don't remember, actually.

116
00:05:45,280 --> 00:05:46,240
Michael: More recent ones?

117
00:05:46,500 --> 00:05:46,840
Nikolay: Maybe.

118
00:05:46,840 --> 00:05:48,280
I don't remember.
I don't remember.

119
00:05:48,520 --> 00:05:50,460
But let's discuss the process.

120
00:05:51,260 --> 00:05:55,580
And from there, probably we will
return to difficulties, right?

121
00:05:55,840 --> 00:05:59,240
So, as I said, the documentation
says the process is super simple.

122
00:06:00,060 --> 00:06:01,500
Replace binaries and restart.

123
00:06:01,960 --> 00:06:05,220
But in cloud, probably you just
should provision new replica

124
00:06:05,280 --> 00:06:06,980
and perform switchover, right?

125
00:06:08,040 --> 00:06:09,320
Michael: Well, what do you mean
in cloud?

126
00:06:09,320 --> 00:06:11,520
Do you mean using a managed provider
or do

127
00:06:11,520 --> 00:06:11,720
Nikolay: you mean...

128
00:06:11,720 --> 00:06:14,860
I mean when it's easy to take another
VM.

129
00:06:15,860 --> 00:06:17,220
Like in general, right?

130
00:06:17,220 --> 00:06:19,340
It can be your own cloud or something.

131
00:06:19,340 --> 00:06:22,820
If it's easier for you to just
to bring a new VM than to deal

132
00:06:22,820 --> 00:06:27,440
with dependencies and conflicts
of binaries, I don't know, like

133
00:06:27,440 --> 00:06:33,700
some packages, you know you can
provision a new node, and when

134
00:06:33,700 --> 00:06:39,720
you install from apt or rpm, apt
install, apt-get install, or

135
00:06:39,840 --> 00:06:44,720
yum install, yum install, how to
pronounce, then you get always

136
00:06:45,240 --> 00:06:46,940
the latest minor version.

137
00:06:47,260 --> 00:06:50,400
You cannot specify a minor version,
we will discuss it later.

138
00:06:51,200 --> 00:06:52,120
Interesting.
Right, right.

139
00:06:52,200 --> 00:06:55,020
It's a big problem actually, but
we will touch it later.

140
00:06:55,440 --> 00:07:00,780
So in this case, in a new node,
you will have already updated

141
00:07:00,820 --> 00:07:01,320
Postgres.

142
00:07:01,860 --> 00:07:05,740
It's a replica, a lag is close
to 0, a synchronous replica for

143
00:07:05,740 --> 00:07:09,020
example, then you just perform
controlled switchover.

144
00:07:09,280 --> 00:07:13,900
Controlled failover also known
as switchover, right?

145
00:07:14,380 --> 00:07:17,460
In this case, all good, right?

146
00:07:18,060 --> 00:07:21,600
Michael: Well, I do think that
process would work most releases,

147
00:07:21,760 --> 00:07:24,880
but I do think there are other
parts that you shouldn't forget,

148
00:07:24,880 --> 00:07:26,940
which are like, read the release
notes.

149
00:07:26,940 --> 00:07:30,625
There might be other additional
steps you need to take in order

150
00:07:30,625 --> 00:07:31,120
to...
No, No,

151
00:07:31,120 --> 00:07:34,060
Nikolay: no, no, no, no, no.

152
00:07:34,060 --> 00:07:35,920
I'm not forgetting anything here.

153
00:07:36,480 --> 00:07:39,720
I'm just talking about the technical
process of upgrade.

154
00:07:40,440 --> 00:07:43,520
Of course, not only you need to
read all the notes, you need

155
00:07:43,520 --> 00:07:46,840
to properly test and almost nobody
does it properly because it's

156
00:07:46,840 --> 00:07:47,680
a lot of work.

157
00:07:47,680 --> 00:07:50,760
I mean, I'm sure clouds do this.

158
00:07:50,760 --> 00:07:52,760
That's why they lag many months
usually.

159
00:07:52,900 --> 00:07:54,180
They do a lot of testing.

160
00:07:54,640 --> 00:08:00,320
That's why usually they're behind
official minor versions, sometimes

161
00:08:00,320 --> 00:08:01,820
skipping some of them actually.

162
00:08:02,220 --> 00:08:07,880
But I'm just talking about 2 big
approaches to upgrade.

163
00:08:08,500 --> 00:08:11,260
Replace binary and restart as documentation
says.

164
00:08:11,780 --> 00:08:15,020
By the way, does the documentation
already mention how to restart

165
00:08:15,020 --> 00:08:15,520
faster?

166
00:08:16,280 --> 00:08:17,800
Michael: It doesn't mention checkpoints.

167
00:08:18,400 --> 00:08:18,900
Nikolay: Exactly.

168
00:08:19,140 --> 00:08:19,860
It should.

169
00:08:19,860 --> 00:08:21,240
It should mention it.

170
00:08:21,600 --> 00:08:25,320
Michael: You did a great whole
episode on how to do it faster.

171
00:08:25,320 --> 00:08:28,100
So I'll link that up in the show
notes for anybody who wasn't

172
00:08:28,100 --> 00:08:29,280
a listener back then.

173
00:08:29,820 --> 00:08:30,320
Nikolay: Yeah.

174
00:08:30,860 --> 00:08:31,500
Super easy.

175
00:08:31,500 --> 00:08:35,780
If you want to restart faster,
you need to remember about shutdown

176
00:08:35,860 --> 00:08:36,360
checkpoint.

177
00:08:36,500 --> 00:08:40,900
Shutdown checkpoint will take long
if you have max WAL size

178
00:08:40,900 --> 00:08:42,660
and checkpoint amount tuned.

179
00:08:42,660 --> 00:08:43,840
You should have it tuned.

180
00:08:43,840 --> 00:08:45,600
We had another episode about it.

181
00:08:45,600 --> 00:08:48,440
And the problem is that during
shutdown checkpoint, Postgres

182
00:08:48,440 --> 00:08:51,500
doesn't accept new queries to execute
at all.

183
00:08:51,500 --> 00:08:52,540
And that's a big problem.

184
00:08:52,540 --> 00:08:55,840
Actually, it could be improved
in Postgres itself because it

185
00:08:55,840 --> 00:08:57,020
could do 2 checkpoints.

186
00:08:57,540 --> 00:08:58,240
Why not?

187
00:08:58,620 --> 00:09:01,480
Like pre-shutdown checkpoint and
actual shutdown checkpoint,

188
00:09:01,480 --> 00:09:02,700
which is super fast.

189
00:09:02,720 --> 00:09:06,900
So our current recipe for all Postgres
versions to restart faster,

190
00:09:06,900 --> 00:09:11,040
you perform an explicit shutdown checkpoint,
which is not blocking

191
00:09:11,040 --> 00:09:11,540
anyone.

192
00:09:11,880 --> 00:09:13,100
Queries are executed.

193
00:09:13,500 --> 00:09:17,760
And then immediately after it's
done, you restart or you shut

194
00:09:17,760 --> 00:09:18,260
down.

195
00:09:18,940 --> 00:09:21,420
Restart consists of shutdown and
start, right?

196
00:09:21,820 --> 00:09:25,640
In this case, shutdown checkpoint
has almost nothing to do, and

197
00:09:25,640 --> 00:09:26,400
it's fast.

198
00:09:26,640 --> 00:09:28,260
This saves a lot of time.

199
00:09:28,260 --> 00:09:28,620
Michael: Yeah.

200
00:09:28,620 --> 00:09:30,840
Last time you mentioned there was
some discussion in your team

201
00:09:30,840 --> 00:09:34,440
as to whether even explicitly doing
2 checkpoints might make

202
00:09:34,440 --> 00:09:37,840
sense because if the first 1, if
you've done, I don't know if

203
00:09:37,840 --> 00:09:40,480
that discussion went anywhere if
you did some further testing

204
00:09:40,480 --> 00:09:40,980
there.

205
00:09:41,280 --> 00:09:44,340
Nikolay: So in total it becomes
already 3 checkpoints.

206
00:09:44,340 --> 00:09:45,040
Michael: 3, yeah.

207
00:09:45,040 --> 00:09:46,760
Nikolay: 2 explicit and 1 sort
of.

208
00:09:46,760 --> 00:09:52,080
Yeah, anyway, you can speed this
restart a lot in a heavily loaded

209
00:09:52,080 --> 00:09:55,140
system if you know this simple
trick.

210
00:09:55,580 --> 00:10:00,120
And this simple trick, we code
it everywhere when we automate

211
00:10:01,020 --> 00:10:05,180
some things, major upgrades, minor
upgrades, various things.

212
00:10:05,420 --> 00:10:08,380
But the good thing, back to my
point that we have 2 approaches,

213
00:10:08,400 --> 00:10:09,060
2 recipes.

214
00:10:09,160 --> 00:10:13,080
1 recipe is this, replace binary
and restart.

215
00:10:13,080 --> 00:10:17,420
And another recipe is what maybe
in other database worlds, not

216
00:10:17,420 --> 00:10:20,580
Postgres, but maybe Oracle, for
example, or SQL Server, what

217
00:10:20,580 --> 00:10:22,500
they call a rolling upgrade, right?

218
00:10:22,500 --> 00:10:26,380
When you upgrade 1 replica, another
replica, and so on, and then

219
00:10:26,380 --> 00:10:29,940
you perform switch over maybe multiple
times, I don't know.

220
00:10:30,040 --> 00:10:33,160
The good thing is that current
Postgres versions and current

221
00:10:33,160 --> 00:10:36,240
Patroni, I think since 2012, right?

222
00:10:37,060 --> 00:10:39,180
Restart is not needed when you
promote.

223
00:10:40,080 --> 00:10:41,460
Because before that, to reconfigure...

224
00:10:43,180 --> 00:10:48,480
Remember recovery.conf was in a
separate file and not in PostgreSQL.conf.

225
00:10:49,840 --> 00:10:55,920
And to reconfigure primary coninfo
or restore command, you needed

226
00:10:55,920 --> 00:10:58,320
to perform restart of your replicas.

227
00:10:58,440 --> 00:11:04,840
So if primary changes, it means
all replicas needed to be restarted.

228
00:11:05,020 --> 00:11:06,260
But right now, no more.

229
00:11:06,260 --> 00:11:07,080
Everything is good.

230
00:11:07,080 --> 00:11:09,020
Promotion can be done.

231
00:11:09,480 --> 00:11:15,640
And promotion and reconfiguration
of your primary on all replicas

232
00:11:15,640 --> 00:11:18,260
can be done without restart and
Patroni does it.

233
00:11:18,600 --> 00:11:23,240
So it means that it's faster and
since restart is not needed,

234
00:11:23,900 --> 00:11:26,820
even no tricks with checkpoints,
right?

235
00:11:28,140 --> 00:11:30,860
Michael: Yeah, I think this is
how some of the cloud providers

236
00:11:30,880 --> 00:11:32,380
are doing it behind the scenes.

237
00:11:32,640 --> 00:11:33,380
Yeah, yeah.

238
00:11:33,740 --> 00:11:35,240
The managed services, yeah.

239
00:11:36,040 --> 00:11:37,040
Nikolay: But good point.

240
00:11:37,240 --> 00:11:39,280
Yeah, let's discuss some topics
here.

241
00:11:39,280 --> 00:11:43,080
So you mentioned you need to check,
for sure you need to check

242
00:11:43,080 --> 00:11:45,980
release notes, because it might
say, release notes might say,

243
00:11:46,500 --> 00:11:49,540
you must re-index some types of
indexes, for example.

244
00:11:50,280 --> 00:11:51,220
I don't know.

245
00:11:51,760 --> 00:11:55,020
Michael: Well, even the let's look
at the latest 16.3, I looked

246
00:11:55,020 --> 00:11:58,660
at the release notes and shout
out to Lucas Vittel, who did an

247
00:11:58,660 --> 00:12:01,920
episode of his 5 minutes of Postgres
on this as well.

248
00:12:01,920 --> 00:12:03,720
So, we'll link that up from last
week.

249
00:12:03,720 --> 00:12:08,680
But he reported a security issue
in the appropriate way, very

250
00:12:08,680 --> 00:12:09,400
well done.

251
00:12:10,080 --> 00:12:12,220
And it got fixed in 16.3.

252
00:12:13,460 --> 00:12:18,480
And you can only fix it, like,
so, there's detailed instructions

253
00:12:18,520 --> 00:12:21,980
in the release notes on how to
go about fixing it.

254
00:12:21,980 --> 00:12:26,120
Just applying the minor release
in the usual way, in either way

255
00:12:26,120 --> 00:12:30,120
that you described actually, I
think, would not fix the...

256
00:12:30,340 --> 00:12:31,480
Actually would the replica...

257
00:12:31,860 --> 00:12:34,540
It depends how you did the replication
thing, I think.

258
00:12:35,080 --> 00:12:38,540
If you spun up a new replica, I
think that might actually be

259
00:12:38,540 --> 00:12:39,040
okay.

260
00:12:39,260 --> 00:12:44,440
But if it was a replica you already
had on a minor version, did

261
00:12:44,440 --> 00:12:49,220
the minor release upgrade there,
and then failed over to it,

262
00:12:49,220 --> 00:12:51,980
you wouldn't get the fix for that
security issue.

263
00:12:52,640 --> 00:12:53,140
Nikolay: Right.

264
00:12:53,400 --> 00:12:57,660
So people in many cases don't do
it and it's bad.

265
00:12:58,380 --> 00:13:00,780
Michael: Well, I think so, especially
on managed service.

266
00:13:00,780 --> 00:13:03,400
Like imagine your managed services
doing it.

267
00:13:03,400 --> 00:13:06,040
You've got, maybe you've scheduled
for some days.

268
00:13:06,580 --> 00:13:08,040
Nikolay: Somebody already did it.

269
00:13:08,600 --> 00:13:10,640
Michael: But maybe they are running
the SQL script.

270
00:13:10,640 --> 00:13:14,840
Like the fix in this case is there's
SQL scripts.

271
00:13:15,280 --> 00:13:19,240
Nikolay: After a situation with
major upgrades, with 1 of our

272
00:13:19,240 --> 00:13:25,780
customers who was on some managed
service which was running on

273
00:13:26,440 --> 00:13:33,220
Zalando operator, where major upgrade
is fully automated, and

274
00:13:33,220 --> 00:13:39,800
they used it, After we saw corruption
related to JLibc version,

275
00:13:40,580 --> 00:13:46,880
and we know Zalando, it was implemented
there, automation, but

276
00:13:46,880 --> 00:13:48,460
it's not enabled by default.

277
00:13:48,900 --> 00:13:53,480
It means that you need to just
specify a special parameter to

278
00:13:53,480 --> 00:13:59,760
enable automation to mitigate GLIBC
version change when you perform

279
00:13:59,760 --> 00:14:01,420
upgrade of your Spillo.

280
00:14:03,480 --> 00:14:06,260
Spillo, this is part of Zalando
operator.

281
00:14:07,120 --> 00:14:13,400
It's like an image ready to run
Postgres and Patroni in clouds,

282
00:14:13,400 --> 00:14:15,060
in AWS, first of all.

283
00:14:15,060 --> 00:14:18,680
So if you don't specify a parameter,
you get corruption of indexes.

284
00:14:19,540 --> 00:14:24,440
And the fact that it was not enabled
by this managed Postgres

285
00:14:24,440 --> 00:14:28,480
provider means that even they didn't
read release notes because

286
00:14:28,480 --> 00:14:29,480
that was specified.

287
00:14:30,240 --> 00:14:31,760
I knew about that automation.

288
00:14:31,780 --> 00:14:34,840
I was curious why it didn't work
in this case.

289
00:14:35,540 --> 00:14:39,520
And I immediately saw in the release
notes that there is such

290
00:14:39,520 --> 00:14:41,620
parameter, and it's not enabled
by default.

291
00:14:42,980 --> 00:14:48,760
So even managed Postgres providers
sometimes skip reading release

292
00:14:48,760 --> 00:14:49,080
notes.

293
00:14:49,080 --> 00:14:51,600
It's a big problem, but it's a
regular problem.

294
00:14:51,600 --> 00:14:53,920
We buy things and we don't read
manuals, right?

295
00:14:53,920 --> 00:14:57,100
We just try to use them right away.

296
00:14:57,340 --> 00:14:58,180
Same things here.

297
00:14:58,180 --> 00:14:59,440
It's like psychology.

298
00:15:00,800 --> 00:15:07,500
So I don't know, of course, let's
be like these kind of guys,

299
00:15:07,500 --> 00:15:10,120
like, always read manual, right?

300
00:15:10,120 --> 00:15:11,700
Always read release notes.

301
00:15:13,380 --> 00:15:14,120
Why upgrade?

302
00:15:14,200 --> 00:15:19,240
Depeche.com is a very convenient
tool to read differences.

303
00:15:20,140 --> 00:15:23,400
Michael: Well, and we have this
gift that the release notes are

304
00:15:23,400 --> 00:15:23,900
good.

305
00:15:24,520 --> 00:15:27,980
They include full details of, but
like when I say full details,

306
00:15:27,980 --> 00:15:31,520
it's kind of an abridged version
of every issue that was fixed.

307
00:15:31,520 --> 00:15:33,360
It's like 1 or 2 paragraphs.

308
00:15:33,620 --> 00:15:35,920
It's very, very simple, very easy
to understand.

309
00:15:35,920 --> 00:15:39,240
Even if you don't use the feature,
it's not that long.

310
00:15:39,240 --> 00:15:39,720
It doesn't.

311
00:15:39,720 --> 00:15:40,580
Yeah, I did.

312
00:15:40,580 --> 00:15:43,020
I read the 16.3 ones quite quickly.

313
00:15:43,180 --> 00:15:46,480
Maybe it took me 5, 10 minutes
just to scan, like to look through

314
00:15:46,480 --> 00:15:49,320
them before the episode, just because
I wanted to see what had

315
00:15:49,320 --> 00:15:49,700
changed.

316
00:15:49,700 --> 00:15:51,800
In fact, there's a couple in there
that I think you'll like,

317
00:15:51,800 --> 00:15:52,620
which is nice.

318
00:15:53,300 --> 00:15:55,820
And yeah, it doesn't take ages.

319
00:15:56,120 --> 00:15:57,440
They're easy to find.

320
00:15:57,720 --> 00:16:03,000
And I think we've just been a little
bit, well, I get frustrated

321
00:16:03,080 --> 00:16:09,940
when I use iOS, I use an Apple
phone, and every single app update

322
00:16:09,960 --> 00:16:14,240
I get, if I look at the release
notes, it just says bug fixes

323
00:16:14,240 --> 00:16:15,560
and performance improvements.

324
00:16:15,940 --> 00:16:17,400
That's all I ever get.

325
00:16:17,400 --> 00:16:18,660
And it's so annoying.

326
00:16:19,540 --> 00:16:21,860
And we get so much more than that
in Postgres.

327
00:16:22,720 --> 00:16:26,260
Anyway, it feels like such a gift
that they're giving us.

328
00:16:26,280 --> 00:16:29,100
Nikolay: You know what I lack in
release notes?

329
00:16:29,100 --> 00:16:30,300
Links to commits.

330
00:16:31,180 --> 00:16:37,280
I always have many minutes spent
trying to link proper release

331
00:16:37,280 --> 00:16:37,780
notes.

332
00:16:37,960 --> 00:16:41,980
It's mostly related to major releases,
though, but minor ones

333
00:16:41,980 --> 00:16:42,900
as well sometimes.

334
00:16:43,200 --> 00:16:47,780
I try to find the exact commit,
commit fast entry and git commit,

335
00:16:48,820 --> 00:16:51,000
just to understand what has changed.

336
00:16:51,260 --> 00:16:54,520
And it would be great to have links
right in the release notes.

337
00:16:54,520 --> 00:16:58,660
Some software has it in the release
notes.

338
00:16:58,660 --> 00:17:02,000
They link commits and pull requests
and so on.

339
00:17:02,080 --> 00:17:06,380
So the bottom line about release
notes is you should read them

340
00:17:06,380 --> 00:17:10,140
even if you're on managed service,
managed Postgres.

341
00:17:10,320 --> 00:17:15,460
You should read them because who
knows what your provider has

342
00:17:15,460 --> 00:17:15,960
missed.

343
00:17:16,320 --> 00:17:21,680
Maybe they were good a couple of
years ago, but then some good

344
00:17:21,680 --> 00:17:23,500
guys were fired.

345
00:17:24,060 --> 00:17:25,540
It happens today.

346
00:17:26,120 --> 00:17:28,680
And this year, probably they are
not so good already.

347
00:17:29,040 --> 00:17:29,940
So it happens.

348
00:17:30,360 --> 00:17:36,040
And unfortunately, here I also
like, if managed service providers

349
00:17:36,040 --> 00:17:40,840
could provide more transparency,
what they tested, how they tested,

350
00:17:40,840 --> 00:17:41,700
blah, blah, blah.

351
00:17:41,880 --> 00:17:45,740
Like, sometimes, like, maybe I
would read that instead, instead

352
00:17:45,860 --> 00:17:49,700
of official release notes, like
results of testing, for example,

353
00:17:49,700 --> 00:17:50,820
published or something.

354
00:17:51,180 --> 00:17:51,900
It would be great.

355
00:17:51,900 --> 00:17:58,920
But I know it takes sometimes a
few months for them to incorporate

356
00:17:59,120 --> 00:17:59,570
release.

357
00:17:59,570 --> 00:18:00,520
ALEX ENDOVSKY

358
00:18:00,820 --> 00:18:01,920
Michael: I would love that.

359
00:18:01,920 --> 00:18:05,140
Imagine getting an email from your
provider saying your scheduled

360
00:18:05,160 --> 00:18:09,140
maintenance is, like, your database
is due to be upgraded at

361
00:18:09,140 --> 00:18:10,860
your scheduled time of this.

362
00:18:11,120 --> 00:18:12,540
Here's what we're doing.

363
00:18:12,700 --> 00:18:14,080
We've already done the release.

364
00:18:14,240 --> 00:18:15,740
Nikolay: Here are the results of
tests.

365
00:18:16,220 --> 00:18:16,560
Details.

366
00:18:16,560 --> 00:18:17,820
Michael: Yeah, that would be amazing.

367
00:18:17,840 --> 00:18:20,740
And then all you have to do is
read the email, and you trust

368
00:18:20,740 --> 00:18:22,060
them to get on with it.

369
00:18:22,080 --> 00:18:25,520
Nikolay: I would read a few times,
but then I know the detail.

370
00:18:25,520 --> 00:18:29,340
Like, I would be impressed, for
example, with the level of detail.

371
00:18:29,380 --> 00:18:31,820
And next time, probably, I already
would rely on that.

372
00:18:31,820 --> 00:18:33,540
But this process would be great.

373
00:18:35,560 --> 00:18:40,620
But not only do you need to read
and perform actions, some versions

374
00:18:40,680 --> 00:18:45,280
require you to do some actions,
like rebuild indexes, to mitigate

375
00:18:45,480 --> 00:18:47,060
possible corruption you have.

376
00:18:47,880 --> 00:18:49,260
Or something like security-wise.

377
00:18:50,380 --> 00:18:54,480
Not only this, it happens in minor
versions as well.

378
00:18:54,480 --> 00:18:57,160
But also you must test it.

379
00:18:57,540 --> 00:19:01,420
2 more things, you must test it
and you should not forget about

380
00:19:01,420 --> 00:19:01,920
extensions.

381
00:19:03,340 --> 00:19:05,580
Michael: Great, so how do you recommend
testing?

382
00:19:06,380 --> 00:19:07,700
Nikolay: Well, regular testing.

383
00:19:08,000 --> 00:19:11,880
Testing means you should do the
same types of actions.

384
00:19:11,940 --> 00:19:16,500
If you do this official recipe
with restart, replacing binaries,

385
00:19:16,500 --> 00:19:20,640
restart, you should do it on some
lower environment, a few weeks

386
00:19:20,640 --> 00:19:24,700
before production upgrade, maybe
a week, at least 1 week.

387
00:19:25,440 --> 00:19:29,220
Michael: In a lot of the non-production
systems I've seen, they're

388
00:19:29,220 --> 00:19:30,760
not under heavy load.

389
00:19:30,900 --> 00:19:33,700
So, and they wouldn't be like long
running queries and things

390
00:19:33,700 --> 00:19:34,160
that could

391
00:19:34,160 --> 00:19:35,420
Nikolay: actually screw you up.

392
00:19:35,740 --> 00:19:40,100
Honestly, I don't think we do need
load testing for minor releases.

393
00:19:40,760 --> 00:19:43,380
It's super expensive to conduct,
usually.

394
00:19:44,760 --> 00:19:50,380
You need separate environments,
separate machines, and a lot

395
00:19:50,380 --> 00:19:51,040
of actions.

396
00:19:53,560 --> 00:19:55,740
Ideally, I would skip it, maybe.

397
00:19:57,380 --> 00:19:58,580
Unless you're a provider.

398
00:19:58,580 --> 00:20:01,400
In this case, you have a lot of
databases, it's better to do

399
00:20:01,400 --> 00:20:01,820
it.

400
00:20:01,820 --> 00:20:05,860
But if I'm an organization with
just a single cluster or a few

401
00:20:05,860 --> 00:20:11,120
clusters, dozens of clusters, then
I just would think about testing

402
00:20:11,120 --> 00:20:15,560
it only if I see potential changes
can affect performance.

403
00:20:17,660 --> 00:20:21,340
Maybe if you have good workflow
of testing, very good automation,

404
00:20:21,560 --> 00:20:22,740
it's good to have.

405
00:20:23,300 --> 00:20:24,500
But what to test?

406
00:20:24,660 --> 00:20:28,500
Simple synthetic benchmarks, it's
already tested in these load

407
00:20:28,500 --> 00:20:30,480
farms, performance farms, right?

408
00:20:30,480 --> 00:20:34,680
It's already happening on various
types of operational systems

409
00:20:34,680 --> 00:20:35,540
and so on.

410
00:20:35,580 --> 00:20:37,080
So maybe it's not interesting.

411
00:20:37,080 --> 00:20:41,080
You need to do it like a lot of
testing is a complex topic, but

412
00:20:41,600 --> 00:20:48,600
at least to try to check that packaging
works as expected.

413
00:20:48,600 --> 00:20:53,160
For example, preparing to this
episode, I just quickly asked

414
00:20:54,060 --> 00:21:00,560
our bot to extract experience from
discussed in mailing lists

415
00:21:01,780 --> 00:21:03,420
issues with minor upgrades.

416
00:21:03,980 --> 00:21:10,220
And they both found some case for
Postgres 9.6.1 upgrading to 9.6.2

417
00:21:11,020 --> 00:21:12,840
when Postgres could not start.

418
00:21:13,520 --> 00:21:18,840
It was related to some problems
with some like corrupted pg_hba.conf.

419
00:21:19,960 --> 00:21:23,820
Packaging put some placeholders
there.

420
00:21:24,440 --> 00:21:25,460
Michael: Wait, which versions?

421
00:21:26,660 --> 00:21:28,200
Nikolay: 9.6.1 to 9.6.2.

422
00:21:29,280 --> 00:21:31,200
That's 6 years ago.

423
00:21:31,800 --> 00:21:33,620
So not 6 even.

424
00:21:34,280 --> 00:21:37,500
Michael: Probably 7 or 8, but it's
still not that long ago.

425
00:21:37,500 --> 00:21:40,320
Nikolay: Yeah, it does matter,
but packaging is another layer

426
00:21:40,360 --> 00:21:41,900
that can introduce some problems.

427
00:21:42,640 --> 00:21:46,320
And you upgrade and something wrong
happens.

428
00:21:47,420 --> 00:21:51,100
Postgres itself was well tested,
but apt-package, for example,

429
00:21:51,100 --> 00:21:52,320
was not well tested.

430
00:21:52,920 --> 00:21:57,460
And if you don't test it on your
environment and don't encounter

431
00:21:57,660 --> 00:22:01,120
with problems of upgrading, it's
bad.

432
00:22:01,400 --> 00:22:02,680
So you should do it.

433
00:22:02,980 --> 00:22:07,480
And then you should run Postgres
for some time just to see that

434
00:22:07,500 --> 00:22:12,840
it's working and all your code
is working and so on.

435
00:22:12,900 --> 00:22:15,520
Michael: Well, and I think I probably
should have asked you about

436
00:22:15,520 --> 00:22:19,320
extensions first because it feels
like testing is also about

437
00:22:19,320 --> 00:22:25,120
testing your extensions and the
way extensions work, the core

438
00:22:25,120 --> 00:22:26,540
aren't testing them, right?

439
00:22:27,100 --> 00:22:31,440
And your combination of extensions
might be, even if you're mostly

440
00:22:31,440 --> 00:22:35,280
using common extensions, you might
not even have any private

441
00:22:35,280 --> 00:22:38,160
ones, or you might be using relatively
few, your combination

442
00:22:38,420 --> 00:22:40,120
might not have been tested by anybody
yet.

443
00:22:40,120 --> 00:22:42,680
The fact that you've got these
4 or 5 extensions.

444
00:22:43,340 --> 00:22:44,380
So I do think...

445
00:22:44,380 --> 00:22:46,980
Nikolay: Everyone is talking about
these problems with combination,

446
00:22:47,180 --> 00:22:50,140
but I personally didn't see problems
with combination.

447
00:22:50,660 --> 00:22:55,340
I know this is a quite popular
topic, but maybe I'm using too

448
00:22:55,340 --> 00:23:00,240
few extensions usually, or seeing
them being used less than 10

449
00:23:00,240 --> 00:23:01,760
or less than 20, for example.

450
00:23:01,760 --> 00:23:02,780
I don't know.

451
00:23:02,800 --> 00:23:09,520
What I know is, first of all, a
minor version of extension is

452
00:23:09,520 --> 00:23:13,440
not changing unless you do it explicitly
with alter extension,

453
00:23:13,440 --> 00:23:13,940
right?

454
00:23:14,480 --> 00:23:17,440
Other session update to version
or something like that, right?

455
00:23:17,440 --> 00:23:21,440
Or just update it to the latest
available version.

456
00:23:21,600 --> 00:23:25,640
And I usually, what I do usually,
I check in our checkup tool,

457
00:23:25,640 --> 00:23:28,880
checks PgStat available extensions.

458
00:23:29,700 --> 00:23:33,080
It has currently installed version
and available version.

459
00:23:33,080 --> 00:23:36,540
And if we see mismatch, it means
that update didn't happen.

460
00:23:36,580 --> 00:23:38,100
And it happens all the time.

461
00:23:38,100 --> 00:23:39,520
People don't upgrade them.

462
00:23:39,520 --> 00:23:41,040
They skip it all the time.

463
00:23:41,040 --> 00:23:43,120
And packages don't upgrade them.

464
00:23:43,520 --> 00:23:44,760
But maybe it's good.

465
00:23:44,760 --> 00:23:45,960
I don't know, actually.

466
00:23:47,060 --> 00:23:47,540
The problem

467
00:23:47,540 --> 00:23:51,540
Michael: is- Would you then schedule
them at the same time as

468
00:23:51,540 --> 00:23:53,080
minor updates for Postgres?

469
00:23:53,860 --> 00:23:54,640
Nikolay: Good question.

470
00:23:54,760 --> 00:23:55,780
Maybe I would...

471
00:23:56,640 --> 00:23:58,780
Yeah, we know this dilemma, right?

472
00:23:58,780 --> 00:23:59,280
Like

473
00:24:00,260 --> 00:24:01,720
Michael: If we

474
00:24:01,720 --> 00:24:04,400
Nikolay: change 1 thing at a time,
overhead is huge.

475
00:24:04,400 --> 00:24:07,580
It's like doing things in separate
transactions, transaction

476
00:24:07,640 --> 00:24:08,080
overhead.

477
00:24:08,080 --> 00:24:10,440
So you need to plan it, coordinate
it.

478
00:24:10,680 --> 00:24:13,340
If you have some bureaucracy, and
you should have some bureaucracy,

479
00:24:13,360 --> 00:24:18,760
like approvals, And then actual
planning and approvals, description

480
00:24:18,840 --> 00:24:22,400
of what to do, how to roll back,
and so on.

481
00:24:23,100 --> 00:24:25,760
And how to downgrade.

482
00:24:25,760 --> 00:24:28,640
We will touch this very soon, I
promise.

483
00:24:29,320 --> 00:24:34,120
But If you combine everything in
1 shot, something goes wrong

484
00:24:34,120 --> 00:24:37,000
and you don't know what it is.

485
00:24:37,540 --> 00:24:39,180
I don't know, maybe I would plan...

486
00:24:39,720 --> 00:24:41,660
Usually, we plan it separately.

487
00:24:43,080 --> 00:24:45,940
But maybe it's not perfect, Honestly.

488
00:24:46,500 --> 00:24:50,100
Extensions usually lack love, I
would say.

489
00:24:50,740 --> 00:24:52,760
DBAs don't go there usually.

490
00:24:52,800 --> 00:24:56,040
Not usually, but DBAs like, oh,
we upgraded this, upgraded that,

491
00:24:56,040 --> 00:24:56,940
good, done.

492
00:24:57,120 --> 00:25:00,800
Extensions, oh, it's like back-end
engineers need them, right?

493
00:25:01,360 --> 00:25:05,440
So unless it's like DBA kind of
extension like Page Inspector,

494
00:25:05,600 --> 00:25:10,320
pg_buffercache or something,
which usually like, do we

495
00:25:10,320 --> 00:25:11,320
need to upgrade them?

496
00:25:11,320 --> 00:25:13,700
Even pg_stat_statements is lagging
sometimes.

497
00:25:14,440 --> 00:25:18,840
And I tell you the story, I remember,
pg_stat_kcache was installed

498
00:25:18,840 --> 00:25:23,960
on a very heavily loaded system
and it got upgraded silently.

499
00:25:25,080 --> 00:25:29,940
It was RPM, it was CentOS I think,
and RPM was upgraded automatically

500
00:25:30,300 --> 00:25:37,700
with various stuff, even non-DBA,
but some SREs, upgraded operational

501
00:25:37,700 --> 00:25:39,020
system packages.

502
00:25:39,960 --> 00:25:45,040
And we had exclusions for all Postgres
packages, but not PgStack.k

503
00:25:45,540 --> 00:25:48,160
cache.
It was not in the block list.

504
00:25:48,580 --> 00:25:49,800
And it got upgraded.

505
00:25:50,740 --> 00:25:55,880
And then every server started crashing
with segfault and so on.

506
00:25:55,960 --> 00:25:59,400
It was a bug introduced in PgStatK
cache.

507
00:25:59,680 --> 00:26:00,680
Not well tested.

508
00:26:01,220 --> 00:26:05,820
Not noticed in lower environments
because nobody tested properly

509
00:26:06,100 --> 00:26:08,500
upgrades of operational system
components there.

510
00:26:08,500 --> 00:26:12,940
It's like testing should be done
properly, but it requires a

511
00:26:12,940 --> 00:26:14,680
lot of effort, coordination, and
so on.

512
00:26:14,680 --> 00:26:15,180
Right?

513
00:26:15,360 --> 00:26:20,140
So minor extensions might introduce
problems, even if you don't

514
00:26:20,500 --> 00:26:24,020
explicitly use them, but they are
loaded in SharePilot libraries,

515
00:26:24,060 --> 00:26:25,820
it still can be dangerous.

516
00:26:27,720 --> 00:26:30,600
It can crash your server if there
is some bug.

517
00:26:31,500 --> 00:26:35,400
I don't know, maybe it should be
in 1 shot, minor version plus

518
00:26:35,400 --> 00:26:42,440
all related extensions, but contrib
modules, they have same cycle,

519
00:26:42,440 --> 00:26:46,740
but third-party extensions like
PgStack and KCache, they have

520
00:26:47,040 --> 00:26:48,060
their own cycle.

521
00:26:48,680 --> 00:26:49,200
And
Michael: you need

522
00:26:49,200 --> 00:26:52,440
Nikolay: to follow all of them
and check release notes of all

523
00:26:52,440 --> 00:26:53,100
of them.

524
00:26:54,280 --> 00:26:57,180
Honestly, I like the extensibility.

525
00:26:57,660 --> 00:27:02,880
But I also like when everything
is monolithic and comes well-tested.

526
00:27:04,660 --> 00:27:06,500
Michael: We can't have it both
ways, though, can we?

527
00:27:06,500 --> 00:27:10,240
Especially if we see the progress
pgvector's been making so

528
00:27:10,240 --> 00:27:14,640
rapidly because they can just release
multiple new features per

529
00:27:14,640 --> 00:27:15,140
year.

530
00:27:15,480 --> 00:27:17,580
It's hard to argue against that.

531
00:27:17,620 --> 00:27:20,820
Nikolay: But cloud providers lag
a lot with upgrading.

532
00:27:20,980 --> 00:27:23,740
It's a super hot topic, but look
at AWS.

533
00:27:23,860 --> 00:27:29,760
They just upgraded the pgvector
to 0.7.0 only a few days ago.

534
00:27:30,240 --> 00:27:31,760
But it was released...

535
00:27:31,920 --> 00:27:32,780
I don't remember.

536
00:27:32,780 --> 00:27:35,940
I saw some huge lags, months again.

537
00:27:36,160 --> 00:27:37,540
Everyone needs it right now.

538
00:27:37,540 --> 00:27:39,280
Things are moving so fast.

539
00:27:39,280 --> 00:27:43,040
But I guess it's a lot of testing
and maybe adjustments and so

540
00:27:43,040 --> 00:27:43,540
on.

541
00:27:43,620 --> 00:27:47,460
And if you check Cloud SQL, They
lag even more, I think.

542
00:27:49,120 --> 00:27:51,300
Michael: I think you might be thinking
of the wrong provider,

543
00:27:51,500 --> 00:27:55,740
because I've always thought AWS
has been really hot on pgvector

544
00:27:55,740 --> 00:27:56,680
updates, especially.

545
00:27:57,440 --> 00:28:00,160
And also, I was going to give them
a shout out, because as a

546
00:28:00,160 --> 00:28:04,540
big company, I'm a bit surprised
they're so able to ship minor

547
00:28:04,540 --> 00:28:04,840
version.

548
00:28:04,840 --> 00:28:08,360
I think they ship the these latest
minor version releases within

549
00:28:08,360 --> 00:28:10,540
a day of the announcement.

550
00:28:12,180 --> 00:28:14,270
Nikolay: 0.7.0 with small like?

551
00:28:14,270 --> 00:28:17,640
Michael: Not pgVector, sorry, I
meant the Postgres QR.

552
00:28:17,640 --> 00:28:18,560
Nikolay: Oh, this manner.

553
00:28:18,920 --> 00:28:19,700
Michael: Like 16.3.

554
00:28:20,140 --> 00:28:20,640
Yeah.

555
00:28:20,900 --> 00:28:23,940
And well, there were only 2 providers
I saw that shipped it,

556
00:28:24,280 --> 00:28:29,180
and I haven't seen any since so
quickly, and that was AWS and

557
00:28:29,820 --> 00:28:30,660
Crunch Data.

558
00:28:31,020 --> 00:28:31,860
Nikolay: That's cool.

559
00:28:32,220 --> 00:28:35,280
Michael: Yeah, got the minor releases
out really quickly, which

560
00:28:35,280 --> 00:28:37,320
sadly, as you say, isn't that common.

561
00:28:37,660 --> 00:28:38,940
And I've been growing

562
00:28:40,360 --> 00:28:41,680
Nikolay: more and more aware of

563
00:28:41,680 --> 00:28:43,240
Michael: how much people lag.

564
00:28:43,340 --> 00:28:45,200
Yeah, Cloud SQL are lagging, unfortunately.

565
00:28:46,800 --> 00:28:49,900
Nikolay: It's last time I checked
and I checked a few times.

566
00:28:50,380 --> 00:28:51,880
They don't lag a lot.

567
00:28:52,540 --> 00:28:56,020
But what you see in documentation,
it's just a problem of documentation.

568
00:28:57,540 --> 00:28:58,220
Michael: Got it.

569
00:28:58,780 --> 00:29:02,280
Nikolay: So if you provision machine,
you see it's quite up to

570
00:29:02,280 --> 00:29:02,780
date.

571
00:29:03,180 --> 00:29:06,040
Michael: Yeah, I didn't provision
new ones on Cloud SQL, I did

572
00:29:06,040 --> 00:29:07,140
just check their docs.

573
00:29:07,660 --> 00:29:11,740
Nikolay: Yeah, there are some SLAs
in terms of version lag, and

574
00:29:11,740 --> 00:29:14,680
they usually define and try to
follow.

575
00:29:14,680 --> 00:29:18,280
Michael: I don't think Cloud SQL
even support version 16, though,

576
00:29:18,280 --> 00:29:19,820
in PostgreSQL, do they?

577
00:29:20,840 --> 00:29:21,760
Nikolay: Yeah, good point.

578
00:29:21,760 --> 00:29:22,860
Probably not yet.

579
00:29:22,960 --> 00:29:25,360
But major version is a different
story.

580
00:29:25,360 --> 00:29:26,520
It can lag like half a year.

581
00:29:26,520 --> 00:29:27,360
Michael: I know, I know.

582
00:29:27,360 --> 00:29:27,700
Nikolay: Easily.

583
00:29:27,700 --> 00:29:30,040
But it's already more than half
a year, right?

584
00:29:30,320 --> 00:29:31,160
Michael: Quick question.

585
00:29:31,920 --> 00:29:33,340
How do you feel about...

586
00:29:34,440 --> 00:29:37,840
I had never thought about it this
way, but I think if I was picking

587
00:29:37,840 --> 00:29:40,900
a new provider today, this is 1
of the things I would look at,

588
00:29:40,900 --> 00:29:43,820
how much do they lag on minor version
updates.

589
00:29:44,720 --> 00:29:48,700
Nikolay: Because it feels dangerous
If I need some bug to be

590
00:29:48,700 --> 00:29:55,740
fixed soon and they tell me it
will take even not months, weeks,

591
00:29:56,180 --> 00:29:57,080
it's frustrating.

592
00:29:59,340 --> 00:30:01,520
Michael: So did you want to talk
about downgrades?

593
00:30:03,600 --> 00:30:10,760
Nikolay: Downgrades is the topic
which you must have if you're

594
00:30:10,760 --> 00:30:14,560
a serious organization, but in
Postgres it's not enterprise-ready

595
00:30:14,820 --> 00:30:15,460
at all.

596
00:30:15,820 --> 00:30:20,740
I mean, if you follow this approach
and official packages, the

597
00:30:20,740 --> 00:30:24,160
official approach again is to replace
binary and restart.

598
00:30:25,140 --> 00:30:35,340
But both apt and rpm packages support
only the latest minor version.

599
00:30:35,740 --> 00:30:38,540
So how am I supposed to downgrade?

600
00:30:39,020 --> 00:30:41,020
The answer is, it's not supported.

601
00:30:42,380 --> 00:30:44,440
Michael: Or do you have to like
store them somewhere?

602
00:30:44,440 --> 00:30:46,060
Like do you have to store them
on your side?

603
00:30:46,060 --> 00:30:48,960
Nikolay: You can download packages
probably and deal with all

604
00:30:48,960 --> 00:30:49,460
dependencies.

605
00:30:50,280 --> 00:30:55,040
I always end up screwing myself
up completely and starting from

606
00:30:55,040 --> 00:30:55,540
scratch.

607
00:30:55,840 --> 00:31:00,640
And this is exactly where I like
having ability to provision

608
00:31:00,660 --> 00:31:01,360
new VM.

609
00:31:01,560 --> 00:31:04,400
If I can provision a new VM, we
have a different approach.

610
00:31:04,640 --> 00:31:08,480
I upgrade a replica, I test it,
probably switch over, probably

611
00:31:08,480 --> 00:31:11,300
keep 1 replica on the previous
version.

612
00:31:12,260 --> 00:31:13,760
Michael: Just in case you need
to go back.

613
00:31:13,940 --> 00:31:17,720
Nikolay: Yeah, and probably I can
do cloning and so on.

614
00:31:17,720 --> 00:31:20,880
In managed services, I guess it's
easy to downgrade, right?

615
00:31:20,900 --> 00:31:23,340
You can just choose the version
which you want.

616
00:31:23,560 --> 00:31:24,220
Or no.

617
00:31:24,220 --> 00:31:26,600
I'm not a big user of RDS and Cloud
SQL.

618
00:31:26,600 --> 00:31:27,540
I usually have...

619
00:31:28,680 --> 00:31:32,820
Like, minor upgrades, I never was
involved, honestly.

620
00:31:34,360 --> 00:31:34,800
Because...

621
00:31:34,800 --> 00:31:38,760
Michael: I think it's important
to be biological, but not through...

622
00:31:39,140 --> 00:31:40,580
Nikolay: No, no, no.

623
00:31:40,840 --> 00:31:42,740
It should be possible to downgrade,
no?

624
00:31:42,740 --> 00:31:46,400
If we don't support downgrades,
Maybe it's not a problem, maybe

625
00:31:46,400 --> 00:31:47,500
nobody needs it.

626
00:31:47,500 --> 00:31:50,100
Postgres never requires you to
downgrade.

627
00:31:50,200 --> 00:31:51,960
It's so well tested, right?

628
00:31:52,680 --> 00:31:53,040
I've

629
00:31:53,040 --> 00:31:55,440
Michael: never seen anyone downgrade
in my profession, though.

630
00:31:55,440 --> 00:31:56,540
It's a good point.

631
00:31:57,180 --> 00:31:58,440
Nikolay: Yeah, I don't know.

632
00:31:58,440 --> 00:32:00,060
Let's provision some notes and
check.

633
00:32:00,060 --> 00:32:02,820
I will have an answer in follow-up
comment probably.

634
00:32:03,460 --> 00:32:08,980
So, yeah, but in my vision, downgrades
are needed.

635
00:32:09,020 --> 00:32:12,040
Otherwise, you don't know what
to put in your plan.

636
00:32:12,040 --> 00:32:16,040
In plan, usually, management requires
what if things go south?

637
00:32:16,360 --> 00:32:18,480
And you should have the point.

638
00:32:18,480 --> 00:32:21,380
This is our rollback plan or downgrade
or something.

639
00:32:21,780 --> 00:32:22,920
Reverse plan, right?

640
00:32:23,200 --> 00:32:28,780
And if downgrades are not supported,
but again, if you have multiple

641
00:32:28,780 --> 00:32:32,500
machines, you can play with it
and include into the plan the

642
00:32:32,500 --> 00:32:35,720
idea that you can move back to
the previous version just performing

643
00:32:36,680 --> 00:32:38,800
switchover backwards, right?

644
00:32:39,720 --> 00:32:42,660
Michael: Yeah, there is a nice
kind of quote from the docs that

645
00:32:42,660 --> 00:32:44,280
I pulled out before the episode.

646
00:32:44,540 --> 00:32:48,740
It says, I think in the versioning
policy, the community considers

647
00:32:48,760 --> 00:32:51,720
performing minor upgrades to be
less risky than continuing to

648
00:32:51,720 --> 00:32:53,300
run an old minor version.

649
00:32:53,480 --> 00:32:56,620
We recommend that users always
run the current minor release

650
00:32:56,760 --> 00:32:58,660
associated with their major version.

651
00:32:59,440 --> 00:32:59,940
Nikolay: Right.

652
00:33:00,060 --> 00:33:03,280
Michael: So, you know, it might
even answer the question of whether

653
00:33:03,280 --> 00:33:04,400
they consider downgrading.

654
00:33:05,460 --> 00:33:06,960
It's a very interesting point.

655
00:33:07,480 --> 00:33:09,720
Nikolay: Yeah, well, this is a
good point.

656
00:33:09,720 --> 00:33:10,640
We didn't discuss it.

657
00:33:10,640 --> 00:33:12,940
But in general, you should upgrade
quite fast.

658
00:33:15,480 --> 00:33:18,840
The lag of versions should be very
small, otherwise you skip

659
00:33:18,840 --> 00:33:21,000
a lot of optimizations and bug
fixes.

660
00:33:21,000 --> 00:33:24,060
Sometimes good optimizations happen
in minor versions as

661
00:33:24,060 --> 00:33:24,560
Michael: well.

662
00:33:25,600 --> 00:33:26,100
Really?

663
00:33:26,580 --> 00:33:27,080
Nikolay: Yes.

664
00:33:27,900 --> 00:33:30,640
Michael: I don't remember performance
optimizations.

665
00:33:31,000 --> 00:33:34,880
Nikolay: Yeah, well, if something
was completely wrong, it's

666
00:33:34,940 --> 00:33:36,780
considered a kind of bug, right?

667
00:33:36,780 --> 00:33:39,260
In this case, it can be faster.

668
00:33:39,960 --> 00:33:44,200
Sometimes things work not as expected,
like it was planned to

669
00:33:44,200 --> 00:33:47,120
be fast, but it was not planned
fast because of some problem

670
00:33:47,120 --> 00:33:47,780
in code.

671
00:33:48,080 --> 00:33:50,700
This problem is fixed, it becomes
fast as expected.

672
00:33:51,180 --> 00:33:51,880
This happens.

673
00:33:52,400 --> 00:33:53,160
Michael: Oh, interesting.

674
00:33:53,300 --> 00:33:53,980
Fair enough.

675
00:33:54,960 --> 00:33:55,460
Nikolay: Cool.

676
00:33:55,760 --> 00:33:56,680
This is it.

677
00:33:56,680 --> 00:33:57,320
All right.

678
00:33:57,640 --> 00:33:59,320
What else about minor upgrades?

679
00:33:59,540 --> 00:34:03,060
Michael: I've only got one more thing,
which is kind of a fun one.

680
00:34:03,480 --> 00:34:07,480
I found reading the release notes
like quite fun, like it was

681
00:34:07,480 --> 00:34:10,580
quite it was one in particular that
made me smile, made me think

682
00:34:10,580 --> 00:34:12,740
of you actually, I thought you'd
quite like it.

683
00:34:12,860 --> 00:34:16,060
And that was a fix by David Rowley.

684
00:34:16,060 --> 00:34:19,420
Let me not sure how to pronounce
Rowley Rowley.

685
00:34:21,000 --> 00:34:21,980
Nikolay: You ask me?

686
00:34:22,740 --> 00:34:23,720
Michael: Yeah, yeah.

687
00:34:24,880 --> 00:34:30,780
One of the fixes he did in 16.3,
let me find it quickly, was, I

688
00:34:30,780 --> 00:34:35,880
think I remember it actually, it
was a partitioning bug around

689
00:34:36,140 --> 00:34:36,640
nulls.

690
00:34:37,060 --> 00:34:42,180
So remember how much we talked
about feeling like nulls catch

691
00:34:42,180 --> 00:34:43,020
us out still?

692
00:34:43,100 --> 00:34:46,460
Well, it catches Postgres developers
out, too.

693
00:34:46,920 --> 00:34:52,740
If you partitioned on a boolean
column, so like true, false or

694
00:34:52,740 --> 00:34:59,180
null, and you have three partitions,
one that goes in when it's true,

695
00:34:59,180 --> 00:35:02,060
one when it's false and one when it's
null.

696
00:35:02,200 --> 00:35:07,440
And then you've queried where X
is not false.

697
00:35:09,400 --> 00:35:13,120
It would the partition pruning
would the planner would prune

698
00:35:13,120 --> 00:35:16,160
out the null partition, even though
it should be in there.

699
00:35:16,280 --> 00:35:18,680
It's not false, but it was pruned
out so you'd get incorrect

700
00:35:18,680 --> 00:35:19,840
results.
Nikolay: Because it's unknown.

701
00:35:20,280 --> 00:35:25,460
Yeah, it's a three-value logic,
my favorite topic in SQL.

702
00:35:25,640 --> 00:35:26,140
Michael: Yeah.

703
00:35:26,600 --> 00:35:29,640
So I thought it would make you
feel less bad because it catches

704
00:35:29,640 --> 00:35:31,320
even Postgres development itself.

705
00:35:31,320 --> 00:35:36,180
Nikolay: Three-value logic should
be a central topic in all educational

706
00:35:36,180 --> 00:35:38,960
courses when people study SQL.

707
00:35:39,140 --> 00:35:43,980
My daughter, she studied SQL multiple
times at high school, currently

708
00:35:43,980 --> 00:35:49,380
at university, And right now she
has yet another SQL course and

709
00:35:49,380 --> 00:35:51,960
they don't cover 3-layer logic
at UCSD.

710
00:35:52,040 --> 00:35:52,940
This is ridiculous.

711
00:35:53,480 --> 00:35:54,500
Absolutely ridiculous.

712
00:35:54,560 --> 00:35:56,980
Because this is the source of so
many troubles.

713
00:35:57,260 --> 00:35:59,060
Yet another 1, my favorite, right?

714
00:35:59,060 --> 00:36:01,580
We discussed it also, we had an
episode about nulls.

715
00:36:02,080 --> 00:36:03,760
Michael: Yeah, I'll link that up
as well.

716
00:36:03,760 --> 00:36:06,520
Nikolay: If you touch null, you
should expect unknown.

717
00:36:07,200 --> 00:36:09,640
Unknown is not true at all, right?

718
00:36:10,260 --> 00:36:11,820
And not false as well.

719
00:36:13,080 --> 00:36:13,580
Right.

720
00:36:14,680 --> 00:36:17,180
Michael: Last thing, We have episode
100 coming up.

721
00:36:17,360 --> 00:36:21,360
I put a post on Reddit asking for
ideas on what we should do.

722
00:36:21,360 --> 00:36:21,860
Nikolay: Yeah.

723
00:36:22,420 --> 00:36:25,240
Michael: I will link it up in case
anybody wants to send us any

724
00:36:25,240 --> 00:36:25,680
more ideas.

725
00:36:25,680 --> 00:36:27,840
Anything you wanted to ask people,
Nikolai?

726
00:36:28,380 --> 00:36:32,120
Nikolay: Yeah, if someone is running
Postgres 100 plus terabytes

727
00:36:32,440 --> 00:36:37,400
under good load, at least dozens
of transactions per second.

728
00:36:38,640 --> 00:36:41,380
Reach out to us, let's discuss
the complexities.

729
00:36:41,760 --> 00:36:43,880
Maybe we will have you as a guest.

730
00:36:46,120 --> 00:36:48,840
Yeah, by the way, I just checked
the Cloud SQL in RDS.

731
00:36:48,840 --> 00:36:51,940
I didn't find, maybe it's there,
but I didn't find how to choose

732
00:36:51,940 --> 00:36:53,680
minor version in Cloud SQL.

733
00:36:53,680 --> 00:36:59,060
But in RDS, it's easy and they
already have all these minor versions

734
00:36:59,060 --> 00:37:00,300
released last Thursday.

735
00:37:00,300 --> 00:37:00,800
So

736
00:37:01,160 --> 00:37:03,300
Michael: yeah, that's the day.

737
00:37:03,960 --> 00:37:06,400
Nikolay: As you said, yeah, 16.3,
15.7.

738
00:37:06,900 --> 00:37:09,880
And but I can choose 16.1, for
example.

739
00:37:10,080 --> 00:37:12,140
Interesting that I cannot choose
14.3.

740
00:37:14,760 --> 00:37:15,260
Michael: Good.

741
00:37:15,660 --> 00:37:17,620
Oh, there's like a weird quirk.

742
00:37:17,620 --> 00:37:22,400
I think AWS fixed the issue and
called it 14.3.

743
00:37:23,320 --> 00:37:25,100
But that's like an old...

744
00:37:25,120 --> 00:37:26,980
I remember something weird around
that.

745
00:37:26,980 --> 00:37:28,680
But yeah, I'm not sure if that's
relevant.

746
00:37:28,680 --> 00:37:31,660
Nikolay: They also had have revisions,
I guess, because it's

747
00:37:31,780 --> 00:37:35,860
hyphen r1, r2 means like, I think,
revision 1, revision 2 for

748
00:37:35,860 --> 00:37:36,980
the same minor version.

749
00:37:37,580 --> 00:37:39,900
So internal versioning additionally.

750
00:37:41,120 --> 00:37:41,620
Interesting.

751
00:37:41,940 --> 00:37:43,320
Because it's not Postgres, right?

752
00:37:43,320 --> 00:37:44,340
It's modified Postgres.

753
00:37:44,340 --> 00:37:45,380
Michael: Yeah, yeah, true.

754
00:37:45,480 --> 00:37:47,260
Things that they've added or changed.

755
00:37:47,360 --> 00:37:47,780
Nikolay: Yeah.

756
00:37:47,780 --> 00:37:48,580
Right, right.

757
00:37:48,600 --> 00:37:49,280
Well, good.

758
00:37:49,280 --> 00:37:51,840
So in general, it means that downgrades
are possible.

759
00:37:51,900 --> 00:37:56,140
And if managed postgres are like
this, like RDS, you can have...

760
00:37:56,580 --> 00:37:59,340
You already upgraded the whole
cluster, but then you understand

761
00:37:59,640 --> 00:38:05,215
if anything goes wrong, you can
provision the replica on lower

762
00:38:05,215 --> 00:38:05,560
version.
On a

763
00:38:05,560 --> 00:38:06,320
Michael: different version.

764
00:38:06,600 --> 00:38:09,520
Nikolay: Previous version, for
example, and switch over to it.

765
00:38:10,760 --> 00:38:15,540
But if you cannot do it, you need
to keep all the old nodes,

766
00:38:15,920 --> 00:38:19,540
un-upgraded nodes, and then be
ready to go back to them.

767
00:38:19,540 --> 00:38:23,260
Also possible, but maybe requires
more resources to be spent.

768
00:38:23,320 --> 00:38:25,140
You need to run them for some time.

769
00:38:26,820 --> 00:38:29,020
So easy topic, right?

770
00:38:29,020 --> 00:38:29,520
Simple.

771
00:38:29,860 --> 00:38:33,140
Michael: Well, I'm glad we covered
it in a bit more detail.

772
00:38:33,160 --> 00:38:38,620
Nikolay: Maybe this packaging minor
version problem, I know people

773
00:38:38,620 --> 00:38:39,440
tried to solve it.

774
00:38:39,440 --> 00:38:41,820
Someone from Percona, I remember,
tried to solve it.

775
00:38:41,820 --> 00:38:48,540
So yeah, at least for containers,
Images should be possible to

776
00:38:48,540 --> 00:38:49,040
specify.

777
00:38:49,400 --> 00:38:52,900
Is it possible for so-called official
images to specify minor

778
00:38:52,900 --> 00:38:53,240
version?

779
00:38:53,240 --> 00:38:54,520
I'm going to check it.

780
00:38:54,520 --> 00:38:55,360
I'm curious.

781
00:38:55,480 --> 00:38:57,880
Can we specify 16.3 right now?

782
00:38:58,260 --> 00:39:01,400
Docker run Postgres colon 16.3.

783
00:39:02,640 --> 00:39:05,120
Docker run Postgres 16.2.

784
00:39:05,740 --> 00:39:07,320
Let's try 16.2 first.

785
00:39:08,380 --> 00:39:09,060
Let's see.

786
00:39:09,060 --> 00:39:10,580
Unable to find image.

787
00:39:10,800 --> 00:39:12,540
Yeah, it's pulling, so it's possible.

788
00:39:12,940 --> 00:39:13,580
This is good.

789
00:39:13,580 --> 00:39:18,240
We have different minor versions
there, so we can test at least

790
00:39:18,240 --> 00:39:20,140
something on containers.

791
00:39:20,600 --> 00:39:21,580
What about 16.3?

792
00:39:21,580 --> 00:39:22,320
Michael: And downgrade.

793
00:39:23,360 --> 00:39:24,180
Nikolay: Yeah, yeah, yeah.

794
00:39:24,280 --> 00:39:25,840
16.3 also there.

795
00:39:25,840 --> 00:39:30,140
16.4 is not available because it
doesn't exist yet.

796
00:39:30,940 --> 00:39:31,600
So good.

797
00:39:31,940 --> 00:39:33,140
Containers are good.

798
00:39:33,520 --> 00:39:34,520
I already forgot.

799
00:39:34,620 --> 00:39:36,180
I've used it multiple times.

800
00:39:36,180 --> 00:39:37,000
I just forgot.

801
00:39:37,480 --> 00:39:38,160
Michael: Oh, cool.

802
00:39:38,800 --> 00:39:39,300
Nikolay: Good.

803
00:39:39,680 --> 00:39:40,180
Michael: Okay.

804
00:39:40,520 --> 00:39:41,700
Thanks so much, Nikolai.

805
00:39:42,700 --> 00:39:43,500
Nikolay: Thank you, Michael.

806
00:39:43,500 --> 00:39:43,811
See you next time.

807
00:39:43,811 --> 00:39:44,849
Catch you
Michael: next week.

808
00:39:45,081 --> 00:39:45,581
Nikolay: Bye.