1
00:00:00,060 --> 00:00:03,720
Nikolay: Hello, hello, this is
PostgresFM, episode number 94.

2
00:00:04,760 --> 00:00:07,980
And we are going to repeat episode
number 4, but not really.

3
00:00:08,600 --> 00:00:12,100
So today we're going to talk about
buffers again.

4
00:00:12,100 --> 00:00:13,620
So let's call it buffers 2.

5
00:00:13,620 --> 00:00:15,560
But I forgot to say hi, Michael.

6
00:00:18,100 --> 00:00:20,720
I was thinking about buffers already
too much.

7
00:00:21,180 --> 00:00:24,300
This topic is super, super important,
I think.

8
00:00:24,720 --> 00:00:24,960
Michael: Yeah.

9
00:00:24,960 --> 00:00:28,240
So last time we called the episode
buffers by default, because

10
00:00:28,580 --> 00:00:32,580
you especially, and I'm a convert
to this, think that buffers,

11
00:00:32,900 --> 00:00:36,480
which is an EXPLAIN parameter should
be on by default, at least

12
00:00:36,480 --> 00:00:37,700
with EXPLAIN ANALYZE.

13
00:00:38,000 --> 00:00:41,260
And there have been a couple of
patches over the years, trying

14
00:00:41,260 --> 00:00:44,440
to put this in place or trying
to make steps towards this as

15
00:00:44,440 --> 00:00:45,040
a goal.

16
00:00:45,060 --> 00:00:47,640
And we were in support of that,
but we also wanted to explain

17
00:00:47,640 --> 00:00:53,120
why we thought people that are
doing query optimization work,

18
00:00:53,120 --> 00:00:57,100
so developers, should include the
BUFFERS parameter when they

19
00:00:57,100 --> 00:01:01,980
run EXPLAIN ANALYZE manually, or
with auto_explain, so that they

20
00:01:01,980 --> 00:01:05,640
can more effectively do query tuning.

21
00:01:05,740 --> 00:01:06,880
Is that a good summary?

22
00:01:07,120 --> 00:01:08,400
Nikolay: Yes, it's a good summary.

23
00:01:08,440 --> 00:01:13,220
Have you ever thought that in EXPLAIN
plans, BUFFERS are not

24
00:01:13,380 --> 00:01:14,280
turned on by default?

25
00:01:14,280 --> 00:01:17,480
Although there is consensus, I
think, and the fact that it was

26
00:01:17,480 --> 00:01:20,520
not changed so far hasn't been
changed.

27
00:01:20,900 --> 00:01:25,240
It's only purely technical reasons,
but I'm sure since you checked

28
00:01:25,240 --> 00:01:27,680
it, probably you'll explain details
here.

29
00:01:27,940 --> 00:01:32,860
But don't you find it's interesting
that EXPLAIN plans by default

30
00:01:32,860 --> 00:01:37,620
have BUFFERS off, but pg_stat_statements
has it on, right?

31
00:01:38,560 --> 00:01:39,900
I mean, just interesting.

32
00:01:41,120 --> 00:01:41,620
Why?

33
00:01:42,620 --> 00:01:47,400
Because if you want to start connected
data, I think we should

34
00:01:47,400 --> 00:01:47,720
do it.

35
00:01:47,720 --> 00:01:51,820
We should look at the whole picture,
macro level, business assessments,

36
00:01:51,820 --> 00:01:55,660
and we also should work with individual
plans provided by EXPLAIN

37
00:01:55,760 --> 00:01:58,220
command, EXPLAIN ANALYZE BUFFERS
command.

38
00:01:58,780 --> 00:02:03,340
And if you forget BUFFERS, just
do EXPLAIN ANALYZE, which many,

39
00:02:03,340 --> 00:02:06,680
many, many, many blog posts still
suggest, unfortunately.

40
00:02:07,600 --> 00:02:14,540
In this case, it's hard for you
to connect the dots between the

41
00:02:14,540 --> 00:02:16,780
individual plans and macro level.

42
00:02:17,360 --> 00:02:20,460
Michael: Yes, an interesting point
about pg_stat_statements.

43
00:02:22,580 --> 00:02:26,240
I reread the 2 threads, the 2 most
recent threads, about turning

44
00:02:26,240 --> 00:02:27,540
BUFFERS on by default.

45
00:02:27,720 --> 00:02:31,200
And the first one was started by
Vic Fearing.

46
00:02:31,360 --> 00:02:35,560
And it actually was He took a step
backwards and said I personally

47
00:02:35,560 --> 00:02:38,400
would like to turn BUFFERS on by
default.

48
00:02:38,860 --> 00:02:42,800
But because I realize that might
not be what everybody wants

49
00:02:42,800 --> 00:02:46,060
and that might be a wider discussion,
here's a patch proposing

50
00:02:46,120 --> 00:02:51,780
that we allow multiple new parameters
for Postgres.

51
00:02:52,580 --> 00:02:58,040
Each one was like it was allowing
you to turn each EXPLAIN parameter

52
00:02:58,180 --> 00:02:59,380
on or off by default.

53
00:02:59,380 --> 00:03:02,420
So, set the default yourself for
your instance.

54
00:03:02,640 --> 00:03:06,560
So you could set BUFFERS on by
default for your own instance

55
00:03:06,560 --> 00:03:08,640
and other people could leave it
off by default.

56
00:03:09,520 --> 00:03:15,220
And that was widely thought of
as people other people wanted

57
00:03:15,220 --> 00:03:18,580
BUFFERS on by default, but because
he also included settings

58
00:03:18,680 --> 00:03:23,080
to enable and analyze by default
and enable every other single,

59
00:03:23,080 --> 00:03:26,280
you know, verbose, right to head
logins, settings actually can't

60
00:03:26,280 --> 00:03:29,060
remember which ones, but basically
all of the EXPLAIN parameters

61
00:03:29,060 --> 00:03:32,560
that existed at the time of the
thread was started also came

62
00:03:32,560 --> 00:03:34,040
with one of these settings.

63
00:03:34,640 --> 00:03:39,340
People fixated on the fact that
an administrator turning on the

64
00:03:39,340 --> 00:03:43,520
ANALYZE parameter by default would
lead to you running EXPLAIN

65
00:03:44,180 --> 00:03:48,240
on its own and that executing the
query, which was deemed to

66
00:03:48,240 --> 00:03:51,760
be a foot gun that would be too
dangerous.

67
00:03:52,200 --> 00:03:56,780
Nikolay: So generalization did
a trick, bad trick here.

68
00:03:57,900 --> 00:04:00,920
And honestly, I think it's not
like I'm not interested in the

69
00:04:00,920 --> 00:04:03,160
ability to turn it for specific
servers.

70
00:04:03,160 --> 00:04:05,940
I want real default for everyone,
global default.

71
00:04:05,940 --> 00:04:12,440
Because I deal with many, many
servers and I cannot tell them

72
00:04:12,440 --> 00:04:12,940
all.

73
00:04:13,660 --> 00:04:14,920
This is what we do right now.

74
00:04:14,920 --> 00:04:16,500
We tell them all right now.

75
00:04:17,220 --> 00:04:20,460
Use BUFFERS by default when you
provide plans.

76
00:04:20,460 --> 00:04:21,900
It works only partially.

77
00:04:23,600 --> 00:04:24,560
It never works.

78
00:04:25,200 --> 00:04:28,400
Efficiency here is not 100% coverage,
right?

79
00:04:28,520 --> 00:04:32,200
So if we can do it by some additional
settings, okay.

80
00:04:32,320 --> 00:04:35,420
Maybe some cloud providers will
do it, I don't know.

81
00:04:35,640 --> 00:04:38,740
But it's this partial solution
I don't like at all.

82
00:04:38,940 --> 00:04:41,360
Michael: I do think we're in a
new world with cloud providers.

83
00:04:41,400 --> 00:04:45,460
I think now we have this, you know,
for example, pg_stat_statements

84
00:04:45,460 --> 00:04:46,580
is off by default.

85
00:04:47,220 --> 00:04:52,700
But in the cloud, most cloud providers
have it on by default.

86
00:04:52,800 --> 00:04:54,440
So some of them have users.

87
00:04:54,840 --> 00:04:57,480
Nikolay: Some of them even learned
about template databases,

88
00:04:57,660 --> 00:04:59,320
template 0, template 1, right?

89
00:04:59,340 --> 00:05:00,560
And put it there.

90
00:05:00,780 --> 00:05:04,660
So each new database created, this
is how it's implemented, right?

91
00:05:04,660 --> 00:05:07,780
Michael: So we're in this new world
where we don't necessarily

92
00:05:07,960 --> 00:05:11,600
have to have things on by default
in Postgres core for it to

93
00:05:11,600 --> 00:05:14,540
be on by default for most users
in the real world.

94
00:05:14,540 --> 00:05:18,080
So There's a slight weird subject
here, but to move on to the

95
00:05:18,080 --> 00:05:18,340
second...

96
00:05:18,340 --> 00:05:19,700
Nikolay: But it's still a partial
solution.

97
00:05:20,080 --> 00:05:23,500
I want Postgres 18, for example,
to have this by default.

98
00:05:23,860 --> 00:05:24,940
What should we do?

99
00:05:25,080 --> 00:05:30,360
But actually, we jumped straight
to this question why it's still

100
00:05:30,360 --> 00:05:34,260
not default, but we didn't explain
what it is, right?

101
00:05:35,320 --> 00:05:39,640
So maybe we should explain first
because it's also important

102
00:05:40,520 --> 00:05:43,220
why we care so much about this.

103
00:05:43,680 --> 00:05:48,460
Why, If a plan doesn't have buffers
data, so what?

104
00:05:48,940 --> 00:05:49,440
Right?

105
00:05:50,340 --> 00:05:52,640
Michael: Yeah, let's do a short
version because I think we did

106
00:05:52,640 --> 00:05:55,240
quite a good job of that in the
first episode we did on this.

107
00:05:55,240 --> 00:05:55,740
Okay.

108
00:05:55,940 --> 00:05:58,700
Which we can point people at, episode
4.

109
00:05:58,700 --> 00:06:00,460
But do you want to give a short
version?

110
00:06:00,560 --> 00:06:02,180
Nikolay: Yeah, let's do a short
version.

111
00:06:02,180 --> 00:06:06,320
So I see this data, as I mentioned,
this data is present in pg_stat_statements

112
00:06:06,320 --> 00:06:09,900
always, so if pg_stat_statements is installed, you

113
00:06:09,900 --> 00:06:13,060
have buffers, shared blocks read,
shared blocks hit.

114
00:06:13,380 --> 00:06:18,740
There are also local and temp pair,
2 additional pairs of numbers,

115
00:06:18,740 --> 00:06:21,540
but it's not like...

116
00:06:21,600 --> 00:06:22,100
Interesting.

117
00:06:22,240 --> 00:06:26,700
Here we will talk only about, let's
talk only about the buffer

118
00:06:26,700 --> 00:06:27,760
pool numbers.

119
00:06:27,980 --> 00:06:32,680
There are also 2 more columns,
dirtied and written.

120
00:06:33,240 --> 00:06:40,220
So if some query changes data,
it's dirtied, if it's written

121
00:06:40,760 --> 00:06:43,980
out of the buffer pool, it's written.

122
00:06:44,480 --> 00:06:50,700
And the Hidden red are interesting,
of course, because this can

123
00:06:50,780 --> 00:06:54,900
give us an idea about cache efficiency
in terms of buffer pool

124
00:06:54,900 --> 00:06:58,380
cache efficiency, because there
is also underlying page cache,

125
00:06:58,380 --> 00:07:00,200
additionally, always still.

126
00:07:00,940 --> 00:07:05,140
And in the EXPLAIN plan, if you run
EXPLAIN, it's only the plan.

127
00:07:05,140 --> 00:07:08,900
If you run EXPLAIN ANALYZE, it's
the plan with execution.

128
00:07:09,520 --> 00:07:13,880
But if you don't put explicitly,
you don't put the BUFFERS word,

129
00:07:14,180 --> 00:07:16,160
and if you put it in parentheses
additionally.

130
00:07:16,640 --> 00:07:19,980
So EXPLAIN (ANALYZE, BUFFERS) in parentheses.

131
00:07:20,600 --> 00:07:24,300
In this case you don't see this
buffer data so you only see timing

132
00:07:24,320 --> 00:07:29,180
and you understand, you cannot
for example understand cache efficiency

133
00:07:29,180 --> 00:07:32,380
in your particular execution of
this particular query.

134
00:07:33,420 --> 00:07:37,280
How much was taken from the buffer
pool, how much was taken outside,

135
00:07:37,280 --> 00:07:39,100
maybe from memory, maybe from disk.

136
00:07:39,220 --> 00:07:40,680
If it's read, it's outside.

137
00:07:40,680 --> 00:07:41,720
You have no idea.

138
00:07:41,720 --> 00:07:47,020
Of course, you can enable timing,
I/O timing, and not regular timing,

139
00:07:47,020 --> 00:07:51,020
which is always enabled in the
EXPLAIN plans, actual timing,

140
00:07:51,020 --> 00:07:51,520
right?

141
00:07:51,580 --> 00:07:52,600
But if you enable I/O

142
00:07:52,600 --> 00:07:57,740
timing, this can give you an idea
about how much time was spent

143
00:07:58,080 --> 00:07:58,780
on I/O.

144
00:07:58,780 --> 00:08:00,340
And as I understand, I/O

145
00:08:01,100 --> 00:08:04,420
in that sense means I/O

146
00:08:04,900 --> 00:08:06,720
outside of the buffer pool, right?

147
00:08:07,700 --> 00:08:09,440
It's so tricky, by the way.

148
00:08:09,440 --> 00:08:10,780
What is I/O?

149
00:08:11,000 --> 00:08:14,360
Does I/O include communication
to what?

150
00:08:14,440 --> 00:08:15,040
To disk?

151
00:08:15,620 --> 00:08:16,300
To memory?

152
00:08:16,360 --> 00:08:17,500
Memory, what memory?

153
00:08:17,500 --> 00:08:21,400
Page cache or buffer pool, both
or just page cache?

154
00:08:21,760 --> 00:08:28,400
In case of TrackIO timing, I guess
everything but communication

155
00:08:28,480 --> 00:08:32,460
to the buffer pool is counted or
not.

156
00:08:32,680 --> 00:08:34,800
Yes, but I would count everything.

157
00:08:35,140 --> 00:08:37,060
So it's tricky.

158
00:08:37,200 --> 00:08:41,000
Anyway, you can see timing again,
but cache efficiency wouldn't

159
00:08:41,000 --> 00:08:42,420
be possible to see.

160
00:08:42,740 --> 00:08:45,700
If you're curious, you do need
buffers.

161
00:08:45,720 --> 00:08:50,460
But in my opinion, cache efficiency
is not the number 1 benefit

162
00:08:50,740 --> 00:08:51,820
that buffers provide.

163
00:08:51,820 --> 00:08:54,260
The number 1 benefit is the amount
of work.

164
00:08:54,640 --> 00:08:59,520
If we know timing to understand
the real efficiency of our mechanism,

165
00:08:59,960 --> 00:09:02,480
we need to understand how much
work was done.

166
00:09:03,060 --> 00:09:06,580
And if we see only the number of
rows returned, it's of course

167
00:09:06,620 --> 00:09:09,280
kind of like this is the final
result, right?

168
00:09:09,280 --> 00:09:09,780
Delivered.

169
00:09:10,280 --> 00:09:10,558
But what's inside?

170
00:09:10,558 --> 00:09:10,650
Inside we need to understand how
much work was done.

171
00:09:10,650 --> 00:09:10,769
And if we see only number of rows
returned, it's of course kind

172
00:09:10,769 --> 00:09:10,842
of like, this is final result,
right, delivered.

173
00:09:10,842 --> 00:09:11,360
But what's inside?

174
00:09:12,400 --> 00:09:16,620
Inside we need to understand how
much actions were needed.

175
00:09:16,620 --> 00:09:19,960
And these actions in the database
world, it's I.O.

176
00:09:20,740 --> 00:09:22,560
Databases are all about I.O.

177
00:09:22,860 --> 00:09:26,100
Of course, we don't do a lot of
calculations.

178
00:09:26,320 --> 00:09:28,020
Usually, we deal with I.O.

179
00:09:28,340 --> 00:09:32,680
The goal is to find the data we
need as quickly as possible to

180
00:09:32,680 --> 00:09:37,120
efficiently change it, rewrite
it, store it, and so on.

181
00:09:37,120 --> 00:09:43,360
So only buffers can show the actual
amount of work that was performed

182
00:09:44,640 --> 00:09:48,420
during the achieving final result.

183
00:09:49,160 --> 00:09:53,740
And this is again, a hit, read,
dirtied, written, shell buffers.

184
00:09:54,560 --> 00:09:58,580
Of course, temporary files are
also interesting if you see temp

185
00:09:58,580 --> 00:10:00,360
buffers written and read.

186
00:10:00,360 --> 00:10:04,140
And this means that this query
needed to deal with...

187
00:10:05,340 --> 00:10:08,220
Work memory was not enough, we
needed to write to disk.

188
00:10:08,420 --> 00:10:12,080
It's also an amount of work and this
can explain some of your timing,

189
00:10:12,380 --> 00:10:14,940
overall latency increased significantly.

190
00:10:15,540 --> 00:10:16,860
So the amount of work.

191
00:10:17,300 --> 00:10:19,460
Let's do some maybe new analogy.

192
00:10:19,600 --> 00:10:22,700
For example, you need to replace
cabinets in the kitchen.

193
00:10:23,560 --> 00:10:24,400
I don't know.

194
00:10:24,440 --> 00:10:26,100
And we're not interested in details.

195
00:10:26,600 --> 00:10:27,980
You need the final result.

196
00:10:28,540 --> 00:10:33,160
And then you hire some contractor
and pay this contractor or

197
00:10:33,160 --> 00:10:35,520
like some team of contractors.

198
00:10:35,640 --> 00:10:40,120
You pay them some money and
they say, oh, you know, this took

199
00:10:40,120 --> 00:10:40,860
1 month.

200
00:10:42,660 --> 00:10:44,940
And you say, well, okay, I see
the final result.

201
00:10:45,020 --> 00:10:47,340
I understand 1 month I pay.

202
00:10:47,540 --> 00:10:48,560
And they said bye-bye.

203
00:10:50,460 --> 00:10:51,680
Maybe it's too much.

204
00:10:51,680 --> 00:10:53,500
How do we understand that it's
too much?

205
00:10:53,920 --> 00:10:57,660
We can compare similar cases.

206
00:10:57,900 --> 00:11:01,480
But it's better to dive in, to
understand what exactly are you

207
00:11:01,480 --> 00:11:02,220
going to do.

208
00:11:02,220 --> 00:11:03,680
Let's check step by step.

209
00:11:03,680 --> 00:11:07,800
You will go buy this thing, you
will put this, you will remove

210
00:11:07,800 --> 00:11:08,300
something.

211
00:11:08,560 --> 00:11:11,740
You start understanding the real
amount of work to be done.

212
00:11:11,920 --> 00:11:13,540
And then you think, really?

213
00:11:13,620 --> 00:11:16,280
You guys spent 1 month for this
kind of work?

214
00:11:16,560 --> 00:11:17,460
This is strange.

215
00:11:17,620 --> 00:11:21,180
But in some cases, they can really
explain, oh, you know, you

216
00:11:21,180 --> 00:11:25,600
have a difficult case here, you
need to change this and that,

217
00:11:25,600 --> 00:11:27,660
and this is really a lot of work.

218
00:11:27,980 --> 00:11:31,020
And then you start understanding,
okay, In this case, probably

219
00:11:31,020 --> 00:11:32,580
1 month is really reasonable.

220
00:11:33,900 --> 00:11:34,840
Similar in databases.

221
00:11:34,940 --> 00:11:36,420
Let's go back to databases.

222
00:11:36,560 --> 00:11:41,720
If you see that to return some
rows, database needed to perform

223
00:11:41,720 --> 00:11:46,940
sequential scan, it's reasonable
to have bad latency, very high

224
00:11:46,940 --> 00:11:51,240
value, because it needed to scan
a very big table.

225
00:11:51,560 --> 00:11:53,080
It's like a lot of work.

226
00:11:53,300 --> 00:11:56,120
And then if it can be optimized,
of course, yes, we can create

227
00:11:56,120 --> 00:11:56,980
index, right?

228
00:11:58,080 --> 00:12:01,360
Maybe we can buy tools for our
contractors, better tools than

229
00:12:01,400 --> 00:12:02,640
they use and so on.

230
00:12:02,640 --> 00:12:03,480
I don't know.

231
00:12:03,620 --> 00:12:06,480
But that inside, this is the key.

232
00:12:06,580 --> 00:12:09,060
You understand the work to be done.

233
00:12:09,580 --> 00:12:10,540
Does it make sense?

234
00:12:11,040 --> 00:12:11,940
Michael: Yeah, it does.

235
00:12:11,940 --> 00:12:14,000
I don't think the analogy is great.

236
00:12:17,160 --> 00:12:20,820
For example, there's 1 parallel
that could make sense.

237
00:12:20,820 --> 00:12:22,760
How many people were working for
the month?

238
00:12:22,760 --> 00:12:25,240
Like was it 4 people flat out for
a month?

239
00:12:25,240 --> 00:12:29,940
Yeah, so parallelism is like an
interesting analogy there.

240
00:12:30,300 --> 00:12:34,920
But I think there's like, you can
see all the steps that Explainer

241
00:12:34,920 --> 00:12:35,380
is doing.

242
00:12:35,380 --> 00:12:38,400
You can see what operations it's
doing even without buffers.

243
00:12:38,400 --> 00:12:38,900
Nikolay: No.

244
00:12:39,340 --> 00:12:42,940
Well, yes, we can see sequential
scan or index scan.

245
00:12:43,020 --> 00:12:48,260
But when you see, For example,
sequential scan, you see rows,

246
00:12:48,260 --> 00:12:52,100
you see index scan, okay, you see
rows are filtered out, but

247
00:12:53,860 --> 00:12:57,020
for example, let's move slightly
to deeper topics.

248
00:12:57,520 --> 00:13:01,020
If the database is bloated, so
to read a thousand rows, we needed

249
00:13:01,020 --> 00:13:03,810
to read a thousand, for example,
sequential scan.

250
00:13:03,810 --> 00:13:08,480
A Thousand rows, they can be packed
in a few pages or they can

251
00:13:08,480 --> 00:13:14,280
be scattered, sparsely distributed
to thousand pages, thousand

252
00:13:14,280 --> 00:13:14,780
blocks.

253
00:13:15,040 --> 00:13:19,540
And you can immediately see, okay,
to read these thousand rows,

254
00:13:19,540 --> 00:13:21,300
we needed to deal with thousand
pages.

255
00:13:21,380 --> 00:13:25,940
Thousand shared buffers hits ideally,
not reads, right?

256
00:13:26,120 --> 00:13:30,520
In this case, you think, okay,
something is not good here.

257
00:13:31,020 --> 00:13:32,780
There is no data locality here.

258
00:13:32,860 --> 00:13:34,260
What can I do about it?

259
00:13:34,400 --> 00:13:38,320
Maybe we deal with bloat or maybe
we just need to think about,

260
00:13:38,800 --> 00:13:41,780
I don't know, like cluster our
data or something.

261
00:13:42,180 --> 00:13:46,320
Not very popular choice, but I
mean, reorganize physically the

262
00:13:46,360 --> 00:13:47,620
order of storage.

263
00:13:48,240 --> 00:13:49,840
But this is good.

264
00:13:49,840 --> 00:13:53,620
I mean, you can understand and
without buffers you don't see

265
00:13:53,620 --> 00:13:54,440
it, right?

266
00:13:55,080 --> 00:13:57,440
Michael: Yeah, I mean, I'm completely
on board with this.

267
00:13:57,440 --> 00:13:59,340
I was just trying to push back
on the analogy.

268
00:13:59,340 --> 00:14:00,840
Like we've discussed this before.

269
00:14:00,920 --> 00:14:03,760
We both agree it should be, we'd
love it on by default.

270
00:14:04,340 --> 00:14:07,840
I make a tool where it really benefits
from when people use buffers.

271
00:14:08,140 --> 00:14:13,520
We give a tip in exactly that scenario
when we call it read efficiency,

272
00:14:13,520 --> 00:14:15,700
when we say the read efficiency
is not good.

273
00:14:15,800 --> 00:14:19,240
And it might be bloat, it might
be data locality, there might

274
00:14:19,240 --> 00:14:20,520
be a different reason.

275
00:14:21,220 --> 00:14:26,000
But yeah, you can't spot read efficiency
issues without buffers.

276
00:14:26,720 --> 00:14:31,140
Or other read related issues like
read speed.

277
00:14:31,240 --> 00:14:35,640
So, if you've got bad disks or
your mentioned like being

278
00:14:35,640 --> 00:14:39,660
throttled by a cloud provider or
something like that. You can't

279
00:14:39,660 --> 00:14:44,000
spot that, you maybe can
get a clue if you're really tuned

280
00:14:44,160 --> 00:14:48,560
into your data volumes and the
time being taken.

281
00:14:49,020 --> 00:14:52,600
But if you've got the number of
blocks or pages, and you can

282
00:14:52,600 --> 00:14:56,640
convert that in your head to, you
know, we're reading 100 megabytes

283
00:14:56,680 --> 00:14:59,360
and it's taking, let's say, 20
seconds.

284
00:15:00,140 --> 00:15:02,160
That's only doing 5 megabytes a
second.

285
00:15:02,160 --> 00:15:03,920
That's terrible throughput.

286
00:15:03,920 --> 00:15:06,400
You know, you've got an idea that...

287
00:15:07,360 --> 00:15:09,640
Nikolay: You touched several interesting
points here.

288
00:15:09,640 --> 00:15:13,300
First of all, sequential scan,
if it...

289
00:15:14,280 --> 00:15:17,160
For example, we have sequential
scan, some filters, and limit

290
00:15:17,160 --> 00:15:17,660
1.

291
00:15:17,780 --> 00:15:19,040
We need just 1 row.

292
00:15:19,180 --> 00:15:22,660
It can be very quick if this row
was found quickly or it can

293
00:15:22,660 --> 00:15:23,540
be very slow.

294
00:15:23,860 --> 00:15:28,940
It takes a lot of time if it needs
to scan through a lot of irrelevant

295
00:15:29,160 --> 00:15:29,660
rows.

296
00:15:30,400 --> 00:15:34,300
And without buffers we have no
idea in the plan, right?

297
00:15:34,300 --> 00:15:37,320
Because we don't see how many rows
were filtered out.

298
00:15:37,420 --> 00:15:42,540
Index will report how many irrelevant
rows are.

299
00:15:42,540 --> 00:15:45,060
Michael: You do see rows removed
by filter in a sequential scan

300
00:15:45,060 --> 00:15:45,760
as well.

301
00:15:45,860 --> 00:15:46,580
Nikolay: How come?

302
00:15:47,660 --> 00:15:48,480
Ah, filter out.

303
00:15:48,480 --> 00:15:49,580
Yeah, okay, sorry.

304
00:15:49,600 --> 00:15:50,500
You're right, you're right.

305
00:15:50,500 --> 00:15:51,840
Yeah, okay.

306
00:15:52,600 --> 00:15:57,800
But it's only about, I understand,
yeah, I agree with you.

307
00:15:57,800 --> 00:16:02,860
It's only about the cases when
storage is not efficient, for

308
00:16:02,860 --> 00:16:06,520
example, bloat or dead tuples,
this won't be reported.

309
00:16:06,700 --> 00:16:09,900
And if we see buffers, we understand,
okay, too many buffers

310
00:16:09,920 --> 00:16:11,420
here are present somehow.

311
00:16:11,740 --> 00:16:12,900
Let's understand why.

312
00:16:13,500 --> 00:16:18,100
And second point I wanted to extract
from your speech.

313
00:16:18,640 --> 00:16:23,680
It's super important not only to
use buffers, but always convert

314
00:16:23,680 --> 00:16:24,520
them to bytes.

315
00:16:26,440 --> 00:16:28,500
You said megabytes per second.

316
00:16:29,120 --> 00:16:31,420
Buffers per second, it's hard to
understand.

317
00:16:32,440 --> 00:16:32,940
Usually.

318
00:16:33,560 --> 00:16:36,820
Michael: It's just, it's also not
what disks, disks don't tell

319
00:16:36,820 --> 00:16:39,520
us how many pages per second they
read.

320
00:16:39,670 --> 00:16:43,000
Nikolay: Exactly, we cannot compare
to our disk characteristics.

321
00:16:43,340 --> 00:16:48,480
We know usually our random access,
sequential access, throughput

322
00:16:48,700 --> 00:16:52,680
numbers, maximum numbers, and for
NVMe, for example, gigabyte,

323
00:16:52,680 --> 00:16:55,760
2 gigabytes per second, then we
understand, okay, we have powerful

324
00:16:55,760 --> 00:17:00,180
disks and like 100 megabytes per
second is already very noticeable.

325
00:17:00,720 --> 00:17:04,400
But if, of course, memory is much
like several orders of magnitude

326
00:17:04,640 --> 00:17:11,580
much faster, but if we say how
many buffers per second, I don't

327
00:17:11,580 --> 00:17:11,980
know.

328
00:17:11,980 --> 00:17:16,520
So we need to always multiply it
by the block size, which is

329
00:17:16,640 --> 00:17:19,520
in absolutely the majority of cases
is 8 KB.

330
00:17:21,400 --> 00:17:26,880
And I wanted to emphasize this
because I think not everyone watched

331
00:17:27,040 --> 00:17:30,140
our episode number 4 about buffers.

332
00:17:30,780 --> 00:17:33,480
So I wanted to emphasize how important
it is.

333
00:17:33,820 --> 00:17:38,140
If you deal with these numbers
and you started using them in

334
00:17:38,140 --> 00:17:42,400
plans and explaining something,
you should understand that your

335
00:17:42,400 --> 00:17:46,560
colleagues might not understand
you unless you convert it to

336
00:17:46,560 --> 00:17:51,720
bytes, to kilobytes, megabytes,
or kibibytes, mibibytes, gibibytes.

337
00:17:52,200 --> 00:17:55,840
So you start putting these numbers,
it's magic.

338
00:17:55,840 --> 00:17:57,780
I saw it so many times, it's magic.

339
00:17:57,780 --> 00:18:03,940
People say, oh, we are using 1
gigabyte here for 10 rows, Okay,

340
00:18:03,940 --> 00:18:05,140
what's happening here?

341
00:18:05,320 --> 00:18:09,600
When we saw like boring buffer
numbers, they didn't have such

342
00:18:09,600 --> 00:18:10,760
reaction at all.

343
00:18:11,380 --> 00:18:15,700
Once they see gigabytes, wow, like
this is not...

344
00:18:17,380 --> 00:18:22,120
I had cases when instead of diving
deep into some specifics of

345
00:18:22,120 --> 00:18:26,120
the plan, it was actually a mistake
if you dive deep straight

346
00:18:26,120 --> 00:18:28,280
to the details of the explained
plan.

347
00:18:29,340 --> 00:18:34,920
Number 1 optimization tip is, okay
guys, how many rows do you

348
00:18:34,920 --> 00:18:42,140
want reasonably to return or touch,
I mean change, and let's

349
00:18:42,140 --> 00:18:44,680
see the whole work done by database
here.

350
00:18:45,520 --> 00:18:50,900
This gives you a very good sense
of how inefficient it is.

351
00:18:51,340 --> 00:18:55,320
And in many cases, it might be
some forgotten limit or something,

352
00:18:55,320 --> 00:18:58,360
and you understand, oh, okay, this
is not what we want actually

353
00:18:58,360 --> 00:18:59,000
at all.

354
00:18:59,540 --> 00:19:02,480
And this is a very high-level optimization
tip.

355
00:19:02,540 --> 00:19:05,520
I see you're skeptical about it,
but it works very well.

356
00:19:06,100 --> 00:19:07,400
Michael: I don't mind it.

357
00:19:07,440 --> 00:19:10,720
I think I've described it, I've
heard this described by others

358
00:19:10,720 --> 00:19:14,540
and described it myself as the
best way of doing work faster

359
00:19:14,540 --> 00:19:16,060
is not doing the work at all.

360
00:19:16,060 --> 00:19:19,760
So if you can avoid doing it entirely,
if you can do less work

361
00:19:19,760 --> 00:19:22,900
in the first place, or no work,
if you don't even need to run

362
00:19:22,900 --> 00:19:24,440
the query, great, get rid of it.

363
00:19:24,440 --> 00:19:27,180
But you're talking about if you
do decide you do need to run

364
00:19:27,180 --> 00:19:31,480
the query, can you just get, like,
Are you doing the smallest

365
00:19:31,480 --> 00:19:33,580
amount of work possible?

366
00:19:34,440 --> 00:19:38,160
Nikolay: So what I think would
be good, you know, there's a summary

367
00:19:38,160 --> 00:19:39,820
in the end of the EXPLAIN plan.

368
00:19:41,240 --> 00:19:45,200
And in this summary, we should
have data in bytes.

369
00:19:46,060 --> 00:19:47,780
It would be great to have it.

370
00:19:48,340 --> 00:19:51,060
Like how much was read-hit?

371
00:19:52,360 --> 00:19:53,560
Michael: You know I've done this.

372
00:19:53,560 --> 00:19:55,140
You know it's actually… I've

373
00:19:55,140 --> 00:19:55,760
Nikolay: done this.

374
00:19:55,760 --> 00:19:57,540
We both did this with our code.

375
00:19:58,520 --> 00:20:00,480
Michael: But I think I did it in
a controversial way.

376
00:20:00,480 --> 00:20:04,240
I did it in a way, I think this
gets down to the kind of trade-off

377
00:20:04,240 --> 00:20:09,140
between being accurate and

378
00:20:09,140 --> 00:20:09,860
being helpful.

379
00:20:10,940 --> 00:20:14,680
If you're already talking about
splitting it into the 4 types,

380
00:20:14,920 --> 00:20:20,380
like, I guess there's actually
8, there's actually 10, you know,

381
00:20:20,380 --> 00:20:25,420
you've got shared hits, shared
writes, shared dirtied, local

382
00:20:25,640 --> 00:20:27,540
read, local hit.

383
00:20:27,560 --> 00:20:31,400
So there's 10 statistics already
if you want to be accurate about

384
00:20:31,400 --> 00:20:35,220
the overall, even the overall reporting.

385
00:20:36,020 --> 00:20:40,740
And what we ended up doing is,
it's horrible and it doesn't make

386
00:20:40,740 --> 00:20:44,060
any sense, but it's nice from like
a, well, we just sum them

387
00:20:44,060 --> 00:20:47,000
all together and give an overall
number.

388
00:20:47,340 --> 00:20:51,020
Nikolay: If you switch from, you
need to switch language and

389
00:20:51,020 --> 00:20:54,660
maybe Postgres itself also at some
point should switch.

390
00:20:54,860 --> 00:20:59,320
Instead of saying shared blocks
read in SP-GiST terminology

391
00:20:59,540 --> 00:21:06,440
or hit, what if we would say shared
block reads, shared block

392
00:21:06,440 --> 00:21:10,320
hits, talking about operations,
not the amount of data.

393
00:21:11,320 --> 00:21:16,580
Because we both know for hits,
for blocks hit, there might be

394
00:21:16,580 --> 00:21:19,400
multiple hits of the same block
many many times.

395
00:21:19,540 --> 00:21:21,720
Nested loop for example, very simple
case.

396
00:21:21,900 --> 00:21:26,020
For reads it's also possible if,
I don't know, it's possible,

397
00:21:26,360 --> 00:21:27,040
I think.

398
00:21:27,720 --> 00:21:30,140
Let's not dive in, let's not spend
time here.

399
00:21:30,140 --> 00:21:34,640
When you summarize it, so instead
of blocks hit, I would say

400
00:21:34,740 --> 00:21:39,440
how many block hits happened, and
how many block reads happened.

401
00:21:39,440 --> 00:21:42,780
And then you summarize and say
how many IOs happened.

402
00:21:42,900 --> 00:21:46,580
I would maybe not summarize them
all together, I would maybe

403
00:21:46,620 --> 00:21:51,940
still distinguish reading and writing
types of like direction

404
00:21:52,200 --> 00:21:53,540
of operation you know.

405
00:21:53,760 --> 00:21:57,260
Michael: I understand but the
reason I love this approach

406
00:21:57,520 --> 00:21:59,380
is I see it separately of

407
00:21:59,380 --> 00:22:01,920
Nikolay: I, I see it separately of

408
00:22:04,940 --> 00:22:07,540
Michael: The reason I like this
is most of the time we're looking

409
00:22:07,540 --> 00:22:11,180
at read queries in like that that's
just the general.

410
00:22:12,260 --> 00:22:14,720
Nikolay: Don't you have write-intensive
workloads?

411
00:22:15,440 --> 00:22:16,560
Michael: Yeah, of course there
are.

412
00:22:16,560 --> 00:22:19,360
But if you're looking at people
trying to speed up queries, like

413
00:22:19,360 --> 00:22:21,300
the vast majority of read queries.

414
00:22:21,780 --> 00:22:26,640
Nikolay: Or they take up data and
extract the search part of

415
00:22:26,640 --> 00:22:28,720
it and optimize it first and select.

416
00:22:28,840 --> 00:22:29,980
Yeah, it also happens.

417
00:22:30,040 --> 00:22:30,980
I agree with you.

418
00:22:31,980 --> 00:22:36,560
Michael: But my main point was,
even if it's a write-intensive

419
00:22:36,820 --> 00:22:39,140
query, it's the before versus the
after.

420
00:22:39,140 --> 00:22:42,040
And it's funny that you used the
word magic a few minutes ago,

421
00:22:42,040 --> 00:22:44,240
that it's the technique that's
magic.

422
00:22:44,280 --> 00:22:46,080
It kind of flips the switch for
people.

423
00:22:46,100 --> 00:22:49,260
I think it's all I use that word,
but in the opposite direction,

424
00:22:49,860 --> 00:22:52,620
I think it takes the magic out
of query optimization.

425
00:22:52,640 --> 00:22:56,600
I think sometimes junior developers
or people beginning think

426
00:22:56,600 --> 00:22:59,320
that adding an index is magic and
it's gone.

427
00:22:59,760 --> 00:23:01,140
We've seen the blog posts.

428
00:23:01,560 --> 00:23:03,760
This query is now 10,000 times
faster.

429
00:23:03,760 --> 00:23:04,540
It's like,

430
00:23:04,540 --> 00:23:05,820
Nikolay: there's no magic here.

431
00:23:06,100 --> 00:23:06,760
This is why.

432
00:23:06,760 --> 00:23:07,400
Yeah, yeah.

433
00:23:07,440 --> 00:23:10,700
So let's agree, there's no contradiction
here at all.

434
00:23:11,140 --> 00:23:17,860
When I started Postgres.AI, my
idea was that Postgres database

435
00:23:17,860 --> 00:23:21,540
administration and optimization
and all other areas, it should

436
00:23:21,540 --> 00:23:23,740
not be black magic, it should be
white magic.

437
00:23:24,280 --> 00:23:28,820
And black magic happens when people
need to spend 10 years because

438
00:23:28,820 --> 00:23:33,940
of lack of transparency of observability
tools or regular tools

439
00:23:34,240 --> 00:23:36,260
and understanding good materials.

440
00:23:36,680 --> 00:23:39,880
Model have much better materials
than 15 years ago, for example.

441
00:23:39,880 --> 00:23:41,020
A lot of good tools.

442
00:23:42,400 --> 00:23:48,280
Buffers is one of the cornerstones
of tools and approaches which

443
00:23:48,900 --> 00:23:50,340
really clarifies everything.

444
00:23:50,420 --> 00:23:53,860
So magic, black magic disappears
and only white magic.

445
00:23:54,440 --> 00:23:57,900
It's white magic when you say,
okay, I did this and now query

446
00:23:57,900 --> 00:23:59,540
is a thousand times faster.

447
00:23:59,820 --> 00:24:00,980
It's great, right?

448
00:24:01,460 --> 00:24:04,340
But looking only at timing, you
cannot achieve it usually.

449
00:24:04,340 --> 00:24:09,060
You do need to look at the work
to be done or is being done,

450
00:24:09,060 --> 00:24:10,060
and this is buffers.

451
00:24:10,580 --> 00:24:13,260
Michael: Well, and if you see on
the one that's a thousand times

452
00:24:13,260 --> 00:24:16,080
faster, that instead of having
to read a gigabyte, or instead

453
00:24:16,080 --> 00:24:19,400
of the sum of buffers being a gigabyte,
the sum of buffers is

454
00:24:19,400 --> 00:24:23,340
now 2 megabytes, like something
in the order of a thousand,

455
00:24:23,360 --> 00:24:24,953
then it becomes obvious why.

456
00:24:24,953 --> 00:24:25,820
Nikolay: It doesn't surprise us
anymore.

457
00:24:25,900 --> 00:24:26,980
Michael: Yeah, exactly.

458
00:24:27,120 --> 00:24:29,600
And it doesn't seem magic to people
anymore.

459
00:24:29,800 --> 00:24:33,960
You start to realize how the index
is working and why it's therefore

460
00:24:33,960 --> 00:24:34,760
so much faster.

461
00:24:34,760 --> 00:24:37,860
So that's the reason I love the
simplicity of it.

462
00:24:38,040 --> 00:24:42,160
But I think it also like, I want
to get back to, is this now

463
00:24:42,160 --> 00:24:45,580
a good time to go back to why adding
buffers by default didn't

464
00:24:45,580 --> 00:24:47,120
work the second time around?

465
00:24:47,120 --> 00:24:47,820
Nikolay: Yeah, please.

466
00:24:48,980 --> 00:24:50,040
I'm all ears.

467
00:24:50,380 --> 00:24:51,760
Michael: Well, you started the
thread.

468
00:24:51,780 --> 00:24:53,040
Nikolay: Yes, maybe I forgot.

469
00:24:54,340 --> 00:24:54,840
Lost.

470
00:24:55,160 --> 00:24:56,140
Tests, right?

471
00:24:56,540 --> 00:24:57,040
Tests.

472
00:24:57,520 --> 00:24:58,680
Michael: Yeah, that was the main...

473
00:24:58,680 --> 00:25:01,720
So, adding it by default, I think
there was a...

474
00:25:02,120 --> 00:25:08,340
One of the problems was the patch
set that ended up getting proposed.

475
00:25:08,560 --> 00:25:11,320
There were about four different patches
that all did slightly different

476
00:25:11,320 --> 00:25:11,680
things.

477
00:25:11,680 --> 00:25:15,260
I think that was part of the problem,
that it wasn't kept as

478
00:25:15,400 --> 00:25:16,500
a simple change.

479
00:25:16,500 --> 00:25:19,480
Actually, the biggest part of the
change proposed was around

480
00:25:19,480 --> 00:25:20,460
tests, as you say.

481
00:25:20,460 --> 00:25:24,800
So, if we turn buffers on by default,
the first question that

482
00:25:24,800 --> 00:25:28,260
you asked and that the patch ended
up doing, was should it be

483
00:25:28,260 --> 00:25:31,760
on for EXPLAIN on its own, as well
as EXPLAIN ANALYZE?

484
00:25:31,760 --> 00:25:33,120
And I think that was a mistake.

485
00:25:33,120 --> 00:25:35,340
I think that overcomplicated matters.

486
00:25:36,500 --> 00:25:37,000
Because

487
00:25:37,280 --> 00:25:38,940
Nikolay: also can have buffers
for play.

488
00:25:38,940 --> 00:25:39,340
Yes.

489
00:25:39,340 --> 00:25:39,840
Right.

490
00:25:40,520 --> 00:25:40,920
Michael: Yes.

491
00:25:40,920 --> 00:25:44,380
But the reason I think that was
a mistake was it changes like

492
00:25:44,380 --> 00:25:50,140
there are far more EXPLAINs in
the test set in Postgres regression

493
00:25:50,140 --> 00:25:51,800
tests than there are EXPLAIN ANALYZEs.

494
00:25:52,000 --> 00:25:53,900
So, firstly, it complicates that.

495
00:25:54,020 --> 00:25:59,960
And secondly, timing is off by
or summary, which gives you the

496
00:25:59,960 --> 00:26:02,340
planning time, is off by default
in EXPLAIN.

497
00:26:02,560 --> 00:26:08,040
So, it's really weird to have buffers
on by default and not summary

498
00:26:08,040 --> 00:26:08,860
on by default.

499
00:26:09,320 --> 00:26:12,760
So, I don't think it makes sense
logically.

500
00:26:13,140 --> 00:26:17,860
But also, it complicates the patch
set and the regression tests.

501
00:26:19,120 --> 00:26:23,040
So if I was redoing this, I would
have a go without that, just

502
00:26:23,040 --> 00:26:25,780
have it on for explain analyze,
not for explain.

503
00:26:26,760 --> 00:26:27,260
Nikolay: Okay.

504
00:26:29,240 --> 00:26:30,060
Michael: Well, that's an opinion.

505
00:26:30,060 --> 00:26:32,540
I don't actually know if it'd be
any better without it.

506
00:26:32,540 --> 00:26:38,080
But the second issue was then,
what do we do with explain...

507
00:26:38,100 --> 00:26:39,880
Explain analyze has timings, right?

508
00:26:39,880 --> 00:26:42,980
But they have a regression test
setting

509
00:26:43,440 --> 00:26:43,660
Nikolay: that

510
00:26:43,660 --> 00:26:47,540
Michael: turns those off so that
each time the regression test

511
00:26:47,900 --> 00:26:51,940
suite runs, the timing's fluctuating
doesn't make the tests fail.

512
00:26:52,200 --> 00:26:55,940
And there was an attempt to add
a similar thing for buffers.

513
00:26:56,600 --> 00:26:57,100
And

514
00:26:57,340 --> 00:27:00,055
Nikolay: that became so to disable
them for tests only instead

515
00:27:00,055 --> 00:27:01,240
of rewriting the tests.

516
00:27:01,240 --> 00:27:05,020
I think, yeah, and maybe it's better
would be to rewrite the

517
00:27:05,020 --> 00:27:09,740
tests, especially because I forgot
to mention the one of the very

518
00:27:09,740 --> 00:27:13,740
good things I like about buffers
is that they are, they don't

519
00:27:13,740 --> 00:27:15,120
lie, they're always stable.

520
00:27:15,600 --> 00:27:18,100
So you can use them in tests and
rely.

521
00:27:18,540 --> 00:27:20,580
Because, well, of course...

522
00:27:20,580 --> 00:27:20,980
They're not

523
00:27:20,980 --> 00:27:22,480
Michael: quite as stable as...

524
00:27:22,820 --> 00:27:23,800
Nikolay: Hits and reads?

525
00:27:24,440 --> 00:27:26,680
Michael: Not just that, planning
buffers as well.

526
00:27:27,980 --> 00:27:30,740
Sometimes you get a lot of planning
buffers, sometimes you don't.

527
00:27:31,100 --> 00:27:32,360
It can depend on...

528
00:27:32,360 --> 00:27:35,660
Nikolay: Okay, at least to some
extent they are stable.

529
00:27:35,660 --> 00:27:39,620
Even if the query, for example,
is blocked by some different

530
00:27:40,080 --> 00:27:45,100
session and is waiting, our session
is waiting for a lock to be

531
00:27:45,100 --> 00:27:49,760
acquired, We can spend a lot of
time on this wait, but buffers

532
00:27:49,760 --> 00:27:50,380
will be fine.

533
00:27:50,380 --> 00:27:51,900
I mean, it won't affect them.

534
00:27:51,900 --> 00:27:53,300
And this is one of the signs.

535
00:27:53,300 --> 00:27:57,180
If you see buffers are low, but
latency is high, maybe locking

536
00:27:57,180 --> 00:27:59,280
issues are involved here.

537
00:27:59,680 --> 00:28:00,180
Right?

538
00:28:01,100 --> 00:28:01,600
Michael: Yeah.

539
00:28:02,380 --> 00:28:05,940
But tests don't, like, they shouldn't
block tests passing, right?

540
00:28:05,940 --> 00:28:08,440
Like, you can handle this a different
way.

541
00:28:08,480 --> 00:28:12,020
Nikolay: By the way, I would bring
my bad or maybe good analogy

542
00:28:12,040 --> 00:28:16,860
back and say if your guys are wasting
a lot of time but do a

543
00:28:17,000 --> 00:28:20,640
little work, maybe there is some
blocker, right?

544
00:28:20,660 --> 00:28:22,740
I mean with a kitchen analogy.

545
00:28:24,600 --> 00:28:25,920
I see some analogy here.

546
00:28:25,920 --> 00:28:26,580
I see.

547
00:28:26,580 --> 00:28:31,240
So at least it's my like I like
this approach when you dive deep

548
00:28:31,240 --> 00:28:35,080
into details, what's happening,
what people are doing, really

549
00:28:35,080 --> 00:28:35,580
doing.

550
00:28:35,700 --> 00:28:39,400
It can be applied to anything,
like maybe lawyers or accountants

551
00:28:39,480 --> 00:28:40,220
or something.

552
00:28:41,040 --> 00:28:43,220
If you hide, you usually pay more.

553
00:28:43,260 --> 00:28:47,360
I mean, if you don't look inside
at the amount of real amount

554
00:28:47,360 --> 00:28:48,600
of work to be done.

555
00:28:48,600 --> 00:28:50,820
You usually end up paying much
more.

556
00:28:50,980 --> 00:28:52,840
And this is true with databases
as well.

557
00:28:52,840 --> 00:28:57,740
If you don't use buffers, you will
be having inefficient plans

558
00:28:57,740 --> 00:29:00,560
and you will need to provision
a bigger instance, for example,

559
00:29:00,560 --> 00:29:02,060
and you will pay more.

560
00:29:02,440 --> 00:29:02,940
Same.

561
00:29:03,480 --> 00:29:05,600
Michael: Maybe we found the real
reason that they won't turn

562
00:29:05,600 --> 00:29:06,600
it on by default.

563
00:29:07,780 --> 00:29:10,760
Nikolay: So too many hackers are
working at cloud companies and

564
00:29:10,760 --> 00:29:12,840
they are not interested in a conspiracy.

565
00:29:15,520 --> 00:29:17,180
Michael: To be clear, I don't believe
that.

566
00:29:17,180 --> 00:29:20,060
Nikolay: Yeah, we have 1, like,
before we finish, because we

567
00:29:20,060 --> 00:29:24,780
have not much time left, let's
touch the very fresh topic, serialize.

568
00:29:25,660 --> 00:29:26,160
Right?

569
00:29:27,700 --> 00:29:28,200
Sure.

570
00:29:30,400 --> 00:29:33,620
Postgres 17 is going to have yet
another option in EXPLAIN, and

571
00:29:33,620 --> 00:29:36,920
it's called Serialize, and everyone
is excited about that, right?

572
00:29:37,840 --> 00:29:38,420
As I said.

573
00:29:38,420 --> 00:29:40,940
Michael: I have seen, I have heard
quite a lot of people quite

574
00:29:40,940 --> 00:29:42,280
excited about it.

575
00:29:42,400 --> 00:29:43,160
Nikolay: It's good.

576
00:29:43,320 --> 00:29:44,160
Michael: I think it is good.

577
00:29:44,160 --> 00:29:48,960
Like, we're getting the option
to see another case where your

578
00:29:48,960 --> 00:29:52,780
query can be slow, but EXPLAIN 
ANALYZE at the moment won't show

579
00:29:52,900 --> 00:29:53,820
that it's slow.

580
00:29:54,080 --> 00:29:55,280
So, yeah, for sure, good.

581
00:29:55,280 --> 00:29:56,740
But, again, off by default.

582
00:29:57,980 --> 00:30:01,535
Nikolay: And Tomasz: What
serialize is, first of all, just

583
00:30:01,535 --> 00:30:03,740
briefly, it's amount of what?

584
00:30:05,340 --> 00:30:06,360
Michael: Oh, come on.

585
00:30:06,500 --> 00:30:06,740
What?

586
00:30:06,740 --> 00:30:08,820
Back to buffers, but bytes and...

587
00:30:09,140 --> 00:30:10,900
Nikolay: No, no, yeah, it's by
default shows...

588
00:30:10,900 --> 00:30:11,400
Timing.

589
00:30:11,980 --> 00:30:12,480
Right.

590
00:30:12,700 --> 00:30:16,320
So timing, bytes, buffers, it shows
everything, especially if

591
00:30:16,320 --> 00:30:19,040
you have buffers, It will show
buffers as well.

592
00:30:19,240 --> 00:30:20,980
But what is it about?

593
00:30:20,980 --> 00:30:22,040
It's about serialization.

594
00:30:22,420 --> 00:30:23,500
So if you...

595
00:30:23,920 --> 00:30:27,440
It can connect us to the last episode
when we discussed SELECT

596
00:30:27,440 --> 00:30:27,940
STAR.

597
00:30:28,260 --> 00:30:31,920
And I said SELECT STAR is actually
not a bad thing because I

598
00:30:31,920 --> 00:30:37,360
was comparing this in my head to
the idea to explicitly list

599
00:30:37,360 --> 00:30:38,260
all the columns.

600
00:30:39,140 --> 00:30:42,740
And so for me it's a more convenient
way.

601
00:30:42,840 --> 00:30:46,760
But of course if you need only
for example 1 column, numeric

602
00:30:46,780 --> 00:30:48,620
column, for example, integer column.

603
00:30:48,900 --> 00:30:52,220
But there is a JSON column next
to it which is TOASTed.

604
00:30:53,400 --> 00:30:58,760
Of course, if you select it, it's
a big piece of inefficiency

605
00:30:58,780 --> 00:30:59,380
in this query.

606
00:30:59,380 --> 00:31:02,880
If you don't need it, you don't
use it, but Postgres will need

607
00:31:02,880 --> 00:31:04,500
to deal with TOAST table.

608
00:31:04,940 --> 00:31:08,300
And serialize option can very well
show this, right?

609
00:31:11,000 --> 00:31:15,360
Instead of selecting just 1 number,
we need to do a lot.

610
00:31:15,720 --> 00:31:19,360
Michael: Well, I think the issue
is more that the EXPLAIN ANALYZE

611
00:31:19,360 --> 00:31:21,640
at the moment doesn't show the
inefficiency.

612
00:31:21,980 --> 00:31:26,520
So if you did SELECT * from
that table, you would get a really

613
00:31:26,520 --> 00:31:29,480
slow, like you did it from your
client, you'd get a really slow

614
00:31:29,480 --> 00:31:32,520
result depending on how much data
there was, you'd see in the

615
00:31:32,520 --> 00:31:34,060
client that it was really slow.

616
00:31:34,120 --> 00:31:35,640
Not just because of the...

617
00:31:35,740 --> 00:31:36,880
Well, there's a couple of reasons.

618
00:31:36,880 --> 00:31:39,240
1 is because a lot of data is being
sent across the wire, but

619
00:31:39,240 --> 00:31:40,700
also because of the serialization.

620
00:31:41,320 --> 00:31:43,100
So there's 2 reasons.

621
00:31:43,660 --> 00:31:45,520
Nikolay: But if you didn't EXPLAIN
and ANALYZE...

622
00:31:45,980 --> 00:31:50,100
Yes, Actually, you know, now I
understand that it's pretty, I

623
00:31:50,100 --> 00:31:54,120
didn't see the discussion beforehand
because there's a demand

624
00:31:54,120 --> 00:31:58,180
actually, and I saw it not once,
when people want to see the

625
00:31:58,180 --> 00:32:01,100
plan and actually result as well.

626
00:32:02,260 --> 00:32:03,740
Michael: Yeah, it's a different
conversation.

627
00:32:03,920 --> 00:32:05,320
And this is a...

628
00:32:05,740 --> 00:32:07,680
Nikolay: EXPLAIN, auto_explain
this, of course.

629
00:32:08,360 --> 00:32:11,760
Michael: Yeah, this is only the
serialization part and it's off

630
00:32:11,760 --> 00:32:12,900
by default again.

631
00:32:14,220 --> 00:32:15,840
So it's more output.

632
00:32:16,100 --> 00:32:16,760
I like it.

633
00:32:16,760 --> 00:32:20,260
We can add it to the list of parameters
we asked people to use

634
00:32:20,640 --> 00:32:20,740
in

635
00:32:20,740 --> 00:32:20,985
Nikolay: the same format.

636
00:32:20,985 --> 00:32:22,580
You think it should be on by default?

637
00:32:23,500 --> 00:32:25,600
Michael: I mean, it's the same
argument, isn't it?

638
00:32:25,600 --> 00:32:28,640
Like, we want to see the actual
work being done.

639
00:32:28,780 --> 00:32:32,500
Nikolay: You know, this And that,
and also, I mean, this, all

640
00:32:32,500 --> 00:32:35,100
by default, that, BUFFERS, is all
by default.

641
00:32:35,600 --> 00:32:38,900
And also, ANALYZE is a super confusing
word, because it's used

642
00:32:38,900 --> 00:32:41,460
in other areas of Postgres administration.

643
00:32:42,280 --> 00:32:48,220
How about having new words invented
that will show proper plans?

644
00:32:48,220 --> 00:32:51,900
Of course, we can do it in psql,
for example, with set.

645
00:32:52,580 --> 00:32:53,640
Backslash set.

646
00:32:54,020 --> 00:32:56,520
Michael: I did consider, I would
be interested in your thoughts

647
00:32:56,520 --> 00:33:01,040
on this and listeners' thoughts
on the idea of EXPLAIN all or

648
00:33:01,040 --> 00:33:01,540
something.

649
00:33:01,640 --> 00:33:06,200
That whatever version of Postgres
you're on, it will do all of

650
00:33:06,200 --> 00:33:06,960
the parameters.

651
00:33:07,840 --> 00:33:11,920
I think including Analyze, full
is loaded for sure.

652
00:33:12,800 --> 00:33:14,560
Because of vacuum for it, I think.

653
00:33:15,120 --> 00:33:15,900
Nikolay: Or what?

654
00:33:16,460 --> 00:33:16,980
Yeah, okay.

655
00:33:16,980 --> 00:33:19,940
Let's not pretend we are hackers
at all.

656
00:33:20,160 --> 00:33:22,200
Michael: No, but I think there
is something to it.

657
00:33:22,200 --> 00:33:23,640
Like, what's the end goal here?

658
00:33:23,640 --> 00:33:25,840
Is EXPLAIN going to have 300 parameters?

659
00:33:25,840 --> 00:33:28,460
But like, how many are we going
to keep adding?

660
00:33:29,760 --> 00:33:30,260
Nikolay: Yeah.

661
00:33:31,300 --> 00:33:31,800
Yes.

662
00:33:32,100 --> 00:33:34,520
So, buffers definitely need to
be civilized.

663
00:33:34,560 --> 00:33:37,620
I need to taste it myself for some
time to conclude.

664
00:33:38,000 --> 00:33:39,240
I would say I would have

665
00:33:39,240 --> 00:33:40,100
Michael: a reservation.

666
00:33:40,120 --> 00:33:43,280
I think it's helpful for showing
people where the problem is,

667
00:33:43,280 --> 00:33:46,920
but because it's still off by default,
I think a lot of times

668
00:33:46,920 --> 00:33:50,940
when it would have been helpful,
it won't be used, which is a

669
00:33:50,940 --> 00:33:51,260
shame.

670
00:33:51,260 --> 00:33:52,400
Nikolay: Yeah, that's interesting.

671
00:33:52,780 --> 00:33:55,440
Everything here serializes by default.

672
00:33:55,440 --> 00:33:59,480
I actually didn't give this a thought
before we recorded this

673
00:33:59,540 --> 00:34:00,020
podcast.

674
00:34:00,020 --> 00:34:00,740
Michael: It's funny.

675
00:34:01,120 --> 00:34:03,980
Nikolay: Yeah, it's actually should
be default on.

676
00:34:04,600 --> 00:34:09,240
Of course, I understand like someone
will say it will produce

677
00:34:09,240 --> 00:34:13,180
observer effect, but honestly,
I don't care because timing itself

678
00:34:13,180 --> 00:34:16,980
produces observer effect, maybe
number 1 player here.

679
00:34:17,080 --> 00:34:19,780
And also this observer effect is
maybe not an observer effect

680
00:34:19,780 --> 00:34:23,980
but an actual normal effect because
the regular client side needs

681
00:34:23,980 --> 00:34:24,740
this part.

682
00:34:25,320 --> 00:34:26,260
Michael: Yeah, it's true.

683
00:34:26,260 --> 00:34:27,180
This is the opposite.

684
00:34:27,520 --> 00:34:31,260
Observer effect, normally we're
talking about the additional

685
00:34:31,320 --> 00:34:35,420
timing added by observing, whereas
this is a case where EXPLAIN

686
00:34:35,420 --> 00:34:38,680
ANALYZE will under report the time
taken because of this.

687
00:34:38,680 --> 00:34:42,080
Nikolay: Skipping some work that
is needed actually.

688
00:34:44,340 --> 00:34:47,220
So I need to, My fingers need to
memorize.

689
00:34:48,100 --> 00:34:51,100
By the way, I type EXPLAIN ANALYZE
BUFFERS super fast.

690
00:34:51,760 --> 00:34:53,800
Now I need as well CLIs.

691
00:34:54,280 --> 00:34:55,660
It's already too much.

692
00:34:55,940 --> 00:34:58,060
Michael: I actually now use a text
expander.

693
00:34:58,260 --> 00:35:01,960
So every time I can type EXPLAIN
ANALYZE and it will change it

694
00:35:01,960 --> 00:35:04,200
to the full list.

695
00:35:04,200 --> 00:35:06,940
But you like obviously you need
to work in lots of different

696
00:35:06,940 --> 00:35:08,320
terminals all over the place.

697
00:35:08,320 --> 00:35:11,180
So yeah, probably best to learn
to type fast.

698
00:35:11,320 --> 00:35:12,320
Nikolay: Anyway, okay.

699
00:35:12,360 --> 00:35:16,180
So we like the thing but don't
like its default off.

700
00:35:17,220 --> 00:35:18,340
Maybe we should actually

701
00:35:18,340 --> 00:35:19,440
Michael: Check the reasons.

702
00:35:20,740 --> 00:35:22,540
It probably makes the test difficult.

703
00:35:23,940 --> 00:35:25,940
Nikolay: Of course, test is difficult,
of course.

704
00:35:26,320 --> 00:35:32,680
But yeah, I would emphasize once
again, in tests, in regular

705
00:35:32,680 --> 00:35:35,640
tests, application tests, buffers
is gold.

706
00:35:36,020 --> 00:35:40,740
Because I understand that sometimes
there may be slight differences,

707
00:35:40,960 --> 00:35:46,600
because for example, data is packed
slightly different physically,

708
00:35:46,620 --> 00:35:46,980
right?

709
00:35:46,980 --> 00:35:48,740
Layout is slightly different.

710
00:35:49,080 --> 00:35:51,720
But there is no order of magnitude
difference.

711
00:35:51,900 --> 00:35:53,700
With timing, you can have it.

712
00:35:53,900 --> 00:35:59,960
Disks are slower, or saturated,
or locking issues, something

713
00:35:59,960 --> 00:36:00,700
like that.

714
00:36:00,960 --> 00:36:03,260
And you have timing, which you
don't understand.

715
00:36:03,340 --> 00:36:06,980
With buffers, it's a reliable number
you can use in testing.

716
00:36:08,940 --> 00:36:09,440
Michael: Cool.

717
00:36:09,520 --> 00:36:10,620
Thanks so much, Nikolay.