1
00:00:00,060 --> 00:00:02,380
Nikolay: Hello, hello, this is
Postgres.FM.

2
00:00:02,440 --> 00:00:07,100
My name is Nikolay, Postgres.AI,
and with me as usual my co-host

3
00:00:07,440 --> 00:00:08,460
Michael, pgMustard.

4
00:00:08,720 --> 00:00:10,400
Hi Michael, how are you doing?

5
00:00:10,840 --> 00:00:13,040
Michael: Hello Nikolay, I'm good,
how are you?

6
00:00:13,860 --> 00:00:15,040
Nikolay: Very good, very good.

7
00:00:15,060 --> 00:00:17,940
So it was my turn to choose the
topic.

8
00:00:18,740 --> 00:00:20,340
And I chose the topic naturally.

9
00:00:20,460 --> 00:00:22,100
I thought we didn't discuss EXPLAIN.

10
00:00:22,200 --> 00:00:24,780
Never, ever we discussed EXPLAIN,
right?

11
00:00:24,780 --> 00:00:25,900
We didn't do it.

12
00:00:25,900 --> 00:00:30,040
But then I googled and found out
that you discussed it without

13
00:00:30,060 --> 00:00:30,560
me.

14
00:00:31,020 --> 00:00:33,500
So I thought, it doesn't count.

15
00:00:34,280 --> 00:00:39,480
Let's discuss once again, but maybe
going deeper in some places.

16
00:00:39,920 --> 00:00:43,660
Michael: Yeah, I covered the basics and
I think it's a good suggestion

17
00:00:43,660 --> 00:00:45,060
to cover more advanced topics.

18
00:00:45,060 --> 00:00:49,640
I think EXPLAIN is one of those topics
that you can just keep getting

19
00:00:49,640 --> 00:00:50,400
more advanced.

20
00:00:50,640 --> 00:00:53,100
And there's definitely a level
at which I'll get way out of my

21
00:00:53,100 --> 00:00:53,600
depth.

22
00:00:54,090 --> 00:00:57,800
But yeah, there's definitely room
for a second episode where

23
00:00:57,800 --> 00:01:00,120
we go into more advanced topics
for sure.

24
00:01:00,420 --> 00:01:00,680
Nikolay: Right.

25
00:01:00,680 --> 00:01:08,040
And just to those who don't know,
you spend most of your professional

26
00:01:08,260 --> 00:01:16,840
time dealing with plans because
pgMustard is helping people analyze

27
00:01:16,920 --> 00:01:20,860
and improve queries, particular
queries, looking at EXPLAIN plans,

28
00:01:20,860 --> 00:01:21,360
right?

29
00:01:21,660 --> 00:01:22,360
Michael: Yeah, exactly.

30
00:01:22,360 --> 00:01:26,200
It's a visualization tool for EXPLAIN
and then also tries to

31
00:01:26,200 --> 00:01:28,940
provide performance advice based
on that.

32
00:01:29,140 --> 00:01:33,840
So we get normally people's more
complex query plans.

33
00:01:34,120 --> 00:01:39,640
So I guess this is an area that
I've got more practical experience

34
00:01:39,640 --> 00:01:40,360
than most.

35
00:01:40,580 --> 00:01:45,240
Obviously, everyone knows when
you're building something, people

36
00:01:45,240 --> 00:01:48,560
that build healthcare software
aren't always intimately familiar

37
00:01:48,560 --> 00:01:50,040
with everything about healthcare.

38
00:01:50,740 --> 00:01:53,100
There's certain things you get
really familiar with, but you

39
00:01:53,100 --> 00:01:55,480
also end up having to deal with
things that are not to do with

40
00:01:55,480 --> 00:01:56,340
that at all.

41
00:01:56,980 --> 00:02:00,880
But yeah, I have over the past
few years come across probably

42
00:02:00,900 --> 00:02:06,740
most variations of how performance
problems can be shown via

43
00:02:06,740 --> 00:02:07,940
EXPLAIN plans.

44
00:02:08,420 --> 00:02:12,560
I have some blind spots for sure
but yeah it's been a big thing

45
00:02:12,560 --> 00:02:15,840
for me but also for you right like
this is a big part of your

46
00:02:15,840 --> 00:02:18,140
work's been performance consulting
and

47
00:02:18,140 --> 00:02:19,400
Nikolay: yeah well It's

48
00:02:19,400 --> 00:02:21,040
Michael: hard to do that without
EXPLAIN.

49
00:02:21,280 --> 00:02:27,780
Nikolay: For me, EXPLAIN is not
the number 1 tool usually, except

50
00:02:27,780 --> 00:02:33,140
cases when a customer asks us to
help with a specific query,

51
00:02:33,240 --> 00:02:34,280
which happens as well.

52
00:02:34,280 --> 00:02:38,400
But usually we deal with analysis,
like top-down analysis of

53
00:02:38,400 --> 00:02:39,220
whole workload.

54
00:02:39,820 --> 00:02:44,160
We start with pg_stat_statements
and wait event analysis, performance

55
00:02:44,180 --> 00:02:45,900
insights, and ideas or something.

56
00:02:46,220 --> 00:02:51,440
Then we go down and when we identify
first aggregated query or

57
00:02:51,460 --> 00:02:55,220
normalized query in official pg_stat_statements
terminology.

58
00:02:56,000 --> 00:02:59,560
Only then we find examples of bad
behaving query.

59
00:03:00,300 --> 00:03:06,520
Then we discuss how to improve,
where to, we need a playground,

60
00:03:06,960 --> 00:03:12,620
a proper playground, and maybe
we will want to discuss how to

61
00:03:12,620 --> 00:03:15,980
make it so EXPLAIN plans match what
we see in production.

62
00:03:16,560 --> 00:03:22,080
And only then we go and check the
actual plan and try to improve

63
00:03:22,080 --> 00:03:23,080
it and so on.

64
00:03:23,360 --> 00:03:26,000
Sometimes we have auto_explain,
right?

65
00:03:26,000 --> 00:03:27,340
auto_explain is a great tool.

66
00:03:27,440 --> 00:03:29,340
Michael: Yeah, we did an episode on that
too.

67
00:03:29,500 --> 00:03:30,880
Nikolay: Yeah, yeah, I remember.

68
00:03:31,560 --> 00:03:33,660
And yeah, if you have it, it's
great.

69
00:03:34,540 --> 00:03:37,440
But it's not always so.

70
00:03:37,440 --> 00:03:38,900
It's not enabled by default.

71
00:03:38,960 --> 00:03:41,320
It requires some effort to be enabled.

72
00:03:41,320 --> 00:03:43,860
Although it's present, it's available
everywhere.

73
00:03:43,860 --> 00:03:47,060
So I wish I saw it more often.

74
00:03:48,480 --> 00:03:51,660
Michael: Did you see I did a blog
post on this this year on which

75
00:03:51,660 --> 00:03:54,560
cloud providers provide access
to EXPLAIN?

76
00:03:54,880 --> 00:03:55,740
I don't

77
00:03:55,840 --> 00:03:58,180
Nikolay: remember honestly, I need
to check it.

78
00:03:58,320 --> 00:04:00,480
Michael: I'll share it, it's not
the most interesting read, but

79
00:04:00,480 --> 00:04:04,280
it's like a good reference and
I find it useful for if somebody

80
00:04:04,280 --> 00:04:06,240
mentions a cloud provider to me.

81
00:04:07,060 --> 00:04:08,500
Nikolay: So everyone has it right?

82
00:04:09,560 --> 00:04:11,180
Michael: Yes, pretty much.

83
00:04:11,720 --> 00:04:14,160
Not quite everyone but pretty much
everyone.

84
00:04:14,760 --> 00:04:20,220
And Most of them provide access
to most parameters, so EXPLAIN

85
00:04:20,220 --> 00:04:24,140
and auto_explain have a bunch of
parameters and I found it hard

86
00:04:24,140 --> 00:04:26,300
to check which ones offered which
parameters.

87
00:04:26,880 --> 00:04:28,640
And I've got good news as well,
actually.

88
00:04:28,660 --> 00:04:33,680
I got a nice direct message from
somebody who works at AWS and

89
00:04:33,940 --> 00:04:36,980
the parameters they didn't offer,
there's like 2 or 3 that they

90
00:04:36,980 --> 00:04:40,200
didn't offer, they've submitted
a pull request for.

91
00:04:40,200 --> 00:04:46,060
So I'm looking forward to the next
release of the RDS products,

92
00:04:46,060 --> 00:04:47,620
which I think will include those.

93
00:04:48,100 --> 00:04:49,080
Nikolay: That's good.

94
00:04:49,600 --> 00:04:54,100
So yeah, of course, we also deal
with explain, but it's just

95
00:04:54,100 --> 00:04:58,640
1 of the things and we probably
don't dive too deep as you do.

96
00:04:59,060 --> 00:05:00,960
So where do you want to start

97
00:05:01,680 --> 00:05:02,420
Michael: this discussion?

98
00:05:02,440 --> 00:05:05,060
It was your suggestion, what do you want to start?

99
00:05:05,060 --> 00:05:06,800
Nikolay: Oh, but you're an expert, right?

100
00:05:06,980 --> 00:05:08,200
Michael: Okay, alright, maybe.

101
00:05:08,200 --> 00:05:11,200
I'm not sure, I still feel really uncomfortable with the word

102
00:05:11,200 --> 00:05:11,700
expert.

103
00:05:11,820 --> 00:05:14,520
Nikolay: I'm feeling very uncomfortable with the word analyze.

104
00:05:15,360 --> 00:05:18,200
It should be execute, not analyze, right?

105
00:05:19,300 --> 00:05:20,370
In the EXPLAIN command.

106
00:05:20,370 --> 00:05:21,300
Michael: I mean, naming...

107
00:05:22,200 --> 00:05:25,360
I've been thinking about naming for a while with this because

108
00:05:26,040 --> 00:05:31,620
part of our tool, we give people something to copy and paste

109
00:05:31,620 --> 00:05:34,340
so they can easily copy it and put it in front of their query

110
00:05:34,340 --> 00:05:37,320
if they're using a like PSQL or an editor or something.

111
00:05:38,260 --> 00:05:40,900
And it's getting longer and longer.

112
00:05:41,040 --> 00:05:44,340
The more parameters that get added to EXPLAIN the more like and

113
00:05:44,340 --> 00:05:47,220
and the more information the better for when you're using a tool.

114
00:05:47,220 --> 00:05:51,100
If you don't have to worry about the output getting long and

115
00:05:51,160 --> 00:05:53,480
your visualization tool should be pointing people at the right

116
00:05:53,480 --> 00:05:56,940
place anyway, the amount of data is like the more the better

117
00:05:56,940 --> 00:05:57,320
really.

118
00:05:57,320 --> 00:06:00,180
The more information you get, the more likely it is you can point

119
00:06:00,180 --> 00:06:01,020
out the problem.

120
00:06:01,360 --> 00:06:06,040
However, that it's become like EXPLAIN and then in parenthesis,

121
00:06:06,160 --> 00:06:11,040
ANALYZE, BUFFERS, VERBOSE, SETTINGS, FORMAT, JSON, and there's

122
00:06:11,040 --> 00:06:11,380
more.

123
00:06:11,380 --> 00:06:17,080
There's like, wow, now there's serialize and memory, but most

124
00:06:17,080 --> 00:06:21,100
of these are off by default so you have to specify all of them

125
00:06:21,200 --> 00:06:22,580
to get all of the information.

126
00:06:22,800 --> 00:06:26,180
So ANALYZE is 1 of them but the reason I was thinking about naming

127
00:06:26,180 --> 00:06:29,640
was I wondered if we could I was wondering about suggesting something

128
00:06:29,640 --> 00:06:34,440
like EXPLAIN all But the problem with that is there's so many

129
00:06:34,440 --> 00:06:38,440
pieces of advice out there in the wild about when you run EXPLAIN,

130
00:06:38,440 --> 00:06:40,980
ANALYZE, be careful because your query will be executed.

131
00:06:41,760 --> 00:06:45,060
If your query's data modifying, it will modify the data.

132
00:06:45,060 --> 00:06:48,880
If your query's long, it will take that amount of time to run.

133
00:06:49,160 --> 00:06:53,880
And all of that advice would become not obsolete, but if we change

134
00:06:53,880 --> 00:06:56,680
the name, all of that advice becomes difficult to like, you have

135
00:06:56,680 --> 00:06:59,140
to go back and change all of the advice, or you have to add,

136
00:06:59,140 --> 00:07:01,620
if you want to EXPLAIN, ANALYZE, or EXPLAIN all.

137
00:07:01,620 --> 00:07:04,900
Nikolay: For 5 years, the old word could be supported,

138
00:07:04,900 --> 00:07:10,640
but it's so confusing to explain people that ANALYZE is actually

139
00:07:10,640 --> 00:07:14,640
executing the query and it's not the same ANALYZE as ANALYZE

140
00:07:14,640 --> 00:07:19,900
table name, which also affects probably plan, but you need to

141
00:07:19,900 --> 00:07:20,820
understand this.

142
00:07:20,980 --> 00:07:24,320
If you run ANALYZE table name, probably after that you will see

143
00:07:24,320 --> 00:07:29,220
different plan in EXPLAIN and EXPLAIN ANALYZE, but it's not because

144
00:07:29,220 --> 00:07:30,800
these 2 words are the same words.

145
00:07:30,800 --> 00:07:32,780
They are very different words.

146
00:07:34,920 --> 00:07:38,660
Remember we discussed Postgres 17, it was confusion resolution.

147
00:07:40,020 --> 00:07:45,520
I wish this confusion 1 day would be resolved as well.

148
00:07:47,580 --> 00:07:50,380
Michael: I agree in principle, I just think in practice it's

149
00:07:50,380 --> 00:07:51,660
gonna be painful.

150
00:07:52,780 --> 00:07:53,580
Nikolay: Why painful?

151
00:07:53,600 --> 00:07:57,100
The word execute is not used yet.

152
00:07:57,700 --> 00:08:01,640
First we introduced synonym and then in 5 years we ditched the

153
00:08:01,640 --> 00:08:02,700
old…

154
00:08:03,120 --> 00:08:05,320
Michael: It's used in prepared statements, right?

155
00:08:05,380 --> 00:08:06,800
Like prepare, execute.

156
00:08:08,180 --> 00:08:11,520
Nikolay: Yeah, but well, execute means execute, and analyze and

157
00:08:11,520 --> 00:08:13,020
explain also means execute.

158
00:08:13,380 --> 00:08:19,540
Okay, by the way, Let me confess about something related to prepared

159
00:08:19,540 --> 00:08:20,040
statements.

160
00:08:20,760 --> 00:08:27,060
A few weeks ago, we had an episode about planning time, which

161
00:08:27,060 --> 00:08:31,680
also related to today's topic because planning time we check

162
00:08:31,680 --> 00:08:33,900
using explain mostly.

163
00:08:33,900 --> 00:08:39,220
Of course we can check it in pg_stat_statements but again not everyone

164
00:08:40,120 --> 00:08:43,740
enabled track planning in pg_stat_statements so usually we just

165
00:08:43,740 --> 00:08:48,080
check in auto_explain or regular manually in ad hoc way we check

166
00:08:48,080 --> 00:08:48,980
using explain.

167
00:08:49,140 --> 00:08:50,640
Not auto_explain, sadly.

168
00:08:51,420 --> 00:08:53,000
It cannot show planning time?

169
00:08:53,000 --> 00:08:53,500
Wow.

170
00:08:53,720 --> 00:08:54,220
Okay.

171
00:08:55,360 --> 00:08:55,840
Good to know.

172
00:08:55,840 --> 00:08:59,000
Michael: But in pg_stat_statements, it's if track planning, yeah.

173
00:08:59,940 --> 00:09:00,630
Nikolay: Yeah, if track planning.

174
00:09:00,630 --> 00:09:05,380
So If we, for example, deal with partitioning, with high number

175
00:09:05,380 --> 00:09:09,640
of partitions, planning time increases as we discussed.

176
00:09:10,840 --> 00:09:16,340
But I just blog posted about it and right after that we had a

177
00:09:16,340 --> 00:09:20,460
discussion on Twitter and it turned out that my experiment was

178
00:09:20,460 --> 00:09:26,500
wrong and my ideas were not fully wrong, but not purely accurate,

179
00:09:27,240 --> 00:09:31,820
because planning time is affected only in the very first time

180
00:09:32,080 --> 00:09:34,460
when queries is executed in the session.

181
00:09:34,820 --> 00:09:39,340
If you execute it once again, we already have relation metadata

182
00:09:39,440 --> 00:09:44,000
cache, rel cache, and overhead becomes almost 0.

183
00:09:44,200 --> 00:09:47,133
At least for simple queries.

184
00:09:47,280 --> 00:09:51,640
Michael: Yes, simple queries that can prune down to a single

185
00:09:51,820 --> 00:09:54,320
partition or like a low number of partitions.

186
00:09:54,320 --> 00:09:56,600
And it was David Rowley, who

187
00:09:56,600 --> 00:10:01,100
Nikolay: pointed out… Yes, of course, If there's no pruning,

188
00:10:01,160 --> 00:10:02,440
then it's a different story.

189
00:10:02,440 --> 00:10:08,040
Of course, with the number of partitions growing, more work needs

190
00:10:08,040 --> 00:10:09,320
to be done at planning time.

191
00:10:09,320 --> 00:10:13,780
But yeah, I'm talking about partition pruning happening at planning

192
00:10:13,780 --> 00:10:14,280
time.

193
00:10:14,700 --> 00:10:19,120
The idea is only the very first execution suffers.

194
00:10:19,900 --> 00:10:23,040
Michael: It's so, I mean I'm really impressed David noticed that

195
00:10:23,040 --> 00:10:23,760
so quickly.

196
00:10:24,140 --> 00:10:24,440
Yeah.

197
00:10:24,440 --> 00:10:27,960
I mean he knows his stuff around
this topic, I'm pretty sure

198
00:10:27,960 --> 00:10:32,280
he was involved in the fixes, I
think it was in Postgres 12 to

199
00:10:32,280 --> 00:10:34,020
improve that exact metric.

200
00:10:34,740 --> 00:10:38,860
So yeah, kudos to him for pointing
that out so politely as well.

201
00:10:38,860 --> 00:10:41,340
I think he said it was misleading,
not wrong.

202
00:10:42,040 --> 00:10:44,480
And also a really good reminder,
do you remember when we had

203
00:10:44,480 --> 00:10:47,300
Melanie on to talk about benchmarking,
how difficult benchmarking

204
00:10:47,320 --> 00:10:49,680
was to make sure you're measuring
the thing you think you're

205
00:10:49,680 --> 00:10:50,180
measuring.

206
00:10:51,220 --> 00:10:54,520
It's just a good reminder that
these things are so difficult.

207
00:10:55,020 --> 00:10:59,760
But by publishing them, by publishing
your findings, and by the

208
00:10:59,760 --> 00:11:03,900
Postgres community being so open
to correcting us when we're

209
00:11:03,900 --> 00:11:06,420
wrong, we've all learned something.

210
00:11:06,420 --> 00:11:06,700
I think

211
00:11:06,700 --> 00:11:07,060
Nikolay: that's good.

212
00:11:07,060 --> 00:11:07,260
Exactly.

213
00:11:07,260 --> 00:11:08,960
It was very good.

214
00:11:09,060 --> 00:11:14,080
And I cannot agree it's wrong because
there was no mistake there.

215
00:11:15,160 --> 00:11:20,380
There was a problem that the experiment
didn't do next step,

216
00:11:20,380 --> 00:11:22,040
which of course I should do.

217
00:11:22,760 --> 00:11:24,840
Actually misleading is a good term
here.

218
00:11:25,140 --> 00:11:28,220
I extended blog post right away.

219
00:11:30,040 --> 00:11:34,120
It turned out if you use just EXPLAIN,
no execution, no ANALYZE

220
00:11:34,120 --> 00:11:34,620
work.

221
00:11:34,700 --> 00:11:37,580
If you just use EXPLAIN, you see
you add partitions.

222
00:11:38,100 --> 00:11:41,860
We checked from 0, 100, 200, and
up to 1,000.

223
00:11:42,440 --> 00:11:47,480
You see we're using EXPLAIN, SELECT,
primary key lookup, and

224
00:11:47,540 --> 00:11:49,280
partition key basically lookup.

225
00:11:49,780 --> 00:11:52,700
You see first of all, you see index
scan, only 1 partition.

226
00:11:52,900 --> 00:11:55,460
Partition pruning is working at
planning time.

227
00:11:55,940 --> 00:12:02,180
But planning time grows linearly
with number of partitions and

228
00:12:02,860 --> 00:12:08,300
number of planning buffers also
grows linearly for this kind

229
00:12:08,300 --> 00:12:10,300
of query and it was range partitioning.

230
00:12:11,040 --> 00:12:12,180
We didn't check others.

231
00:12:12,720 --> 00:12:17,920
But once you run EXPLAIN once again
in the same session, only 3

232
00:12:17,920 --> 00:12:21,720
buffer hits always, regardless
of number of partitions.

233
00:12:21,820 --> 00:12:23,320
It's also good relief, right?

234
00:12:23,320 --> 00:12:24,940
So you think, oh, it's great.

235
00:12:25,840 --> 00:12:30,680
I'm going just to relax and I can
have 10,000 partitions for

236
00:12:30,680 --> 00:12:32,060
simple queries like this.

237
00:12:32,200 --> 00:12:37,800
Of course, complex queries will
be different, I'm quite sure.

238
00:12:38,720 --> 00:12:43,260
But if partition pruning works
at planning time, I'm happy, right?

239
00:12:43,920 --> 00:12:50,320
It just tells me I need to take
care of proper connection poolers.

240
00:12:50,640 --> 00:12:52,280
I need to have them always.

241
00:12:54,280 --> 00:12:59,540
The problem with our discussion
and my ideas a few weeks ago

242
00:12:59,540 --> 00:13:03,400
when I said, oh, if you have high
overhead of planning time,

243
00:13:03,700 --> 00:13:08,860
which obviously it's only for first
execution planning.

244
00:13:09,020 --> 00:13:11,660
In this case, just use prepared
statements.

245
00:13:12,100 --> 00:13:15,720
But this is definitely maybe not
only misleading, but also wrong,

246
00:13:15,720 --> 00:13:21,600
because as I actually learned in
detail, if you use prepare statements,

247
00:13:21,660 --> 00:13:27,340
you see that partition pruning
shifts to execution.

248
00:13:28,260 --> 00:13:32,380
So the plan, a generic plan, includes
all partitions, right?

249
00:13:32,520 --> 00:13:37,540
Because we cannot cache the plan,
which is dealing only with

250
00:13:37,540 --> 00:13:41,580
single partition in this case,
but what if we need to execute

251
00:13:41,580 --> 00:13:43,580
the prepared statements with different
parameter?

252
00:13:44,280 --> 00:13:45,860
We need different partition probably.

253
00:13:47,840 --> 00:13:51,720
The cached plan includes all partitions
and you run explain for

254
00:13:51,720 --> 00:13:56,380
prepared statements for a specific
parameter and you see that

255
00:13:56,380 --> 00:13:59,360
execution time, you see subplans
removed.

256
00:14:00,320 --> 00:14:02,740
This is interesting.

257
00:14:02,740 --> 00:14:07,120
Which means that overhead shifted
to execution.

258
00:14:07,760 --> 00:14:09,020
It's not good, right?

259
00:14:09,060 --> 00:14:10,880
But again, overhead, which overhead?

260
00:14:11,680 --> 00:14:14,740
If we have real cache here, all
good.

261
00:14:15,700 --> 00:14:19,400
So it means that for high number
of partitions, not prepared

262
00:14:19,400 --> 00:14:20,440
statements matter.

263
00:14:20,740 --> 00:14:25,960
They can help, but they don't help
fully because of this shift

264
00:14:25,960 --> 00:14:27,080
of partition pruning.

265
00:14:27,500 --> 00:14:30,400
But what actually helps a lot is
poolers.

266
00:14:30,700 --> 00:14:33,140
We have poolers in all heavily
loaded projects.

267
00:14:33,900 --> 00:14:39,280
And if you are preparing to have
huge tables, like we always

268
00:14:39,280 --> 00:14:44,680
say, if you exceed 100 gigabytes,
you need partitioning.

269
00:14:45,060 --> 00:14:51,040
But it also means you do need proper
connection pooling and avoid

270
00:14:51,760 --> 00:14:56,320
the initial overhead of planning
basically.

271
00:14:56,320 --> 00:14:59,480
With high number of partitions,
planning will be not good.

272
00:15:01,180 --> 00:15:02,580
So back to explain.

273
00:15:03,100 --> 00:15:05,740
Michael: Yeah, question back to
explain, kind of linking them.

274
00:15:06,020 --> 00:15:10,560
When you did the second explain,
extracting the buffers and things,

275
00:15:10,560 --> 00:15:14,020
did you use format JSON for that
and parse the output?

276
00:15:14,340 --> 00:15:15,440
Or what did you do?

277
00:15:17,380 --> 00:15:18,740
Nikolay: It was text regular.

278
00:15:19,200 --> 00:15:20,880
Michael: We've just parsed it.

279
00:15:21,520 --> 00:15:22,700
Nikolay: Settings, yeah.

280
00:15:22,700 --> 00:15:28,060
And it was done by our AI, so it
doesn't use JSON at this point.

281
00:15:28,520 --> 00:15:31,820
Just we don't want very bloated
format.

282
00:15:32,600 --> 00:15:37,120
And yeah, and actually, yeah, we
needed to fix how it works because

283
00:15:37,120 --> 00:15:40,940
right now it works through psycopg,
so Python.

284
00:15:44,600 --> 00:15:49,280
If you run a couple of explains
in one shot, it received only one

285
00:15:49,280 --> 00:15:49,600
result.

286
00:15:49,600 --> 00:15:51,860
We fixed that, so now it sees both
results.

287
00:15:52,540 --> 00:15:56,020
The second result is always, in
our case, three buffer hits.

288
00:15:57,440 --> 00:16:02,540
Planning time remains way below
one millisecond, I think maybe

289
00:16:03,100 --> 00:16:06,720
10 microseconds or so, super tiny,
almost zero.

290
00:16:09,800 --> 00:16:14,280
Bottom line here is that usually
we say, if you run

291
00:16:15,020 --> 00:16:17,840
EXPLAIN (ANALYZE, BUFFERS), of course.

292
00:16:18,460 --> 00:16:20,460
We had a couple of episodes about
it.

293
00:16:20,740 --> 00:16:24,460
If you run it, always run it a
couple of times, maybe a few times

294
00:16:24,480 --> 00:16:25,780
because of caches.

295
00:16:26,400 --> 00:16:30,480
If you want to check timing, if
you're too obsessed by timing,

296
00:16:30,900 --> 00:16:31,600
not my case.

297
00:16:31,600 --> 00:16:33,380
I'm obsessed by buffers, not timing.

298
00:16:35,280 --> 00:16:38,660
You need to run it multiple times
to check the second execution

299
00:16:38,840 --> 00:16:43,080
because first time probably everything
is not cached, data is

300
00:16:43,080 --> 00:16:48,100
not in the buffer pool, and not
in page cache, so run it twice.

301
00:16:48,740 --> 00:16:51,400
That's a good piece of advice,
right?

302
00:16:51,760 --> 00:16:55,460
But what we just discussed means
also that even for planning,

303
00:16:56,760 --> 00:16:59,700
run it twice in the same session,
right?

304
00:17:00,240 --> 00:17:07,540
Because maybe you observe a lot
of work, if partitioning is involved,

305
00:17:07,540 --> 00:17:11,420
especially if a lot of work related
to lack of real cache.

306
00:17:12,160 --> 00:17:13,160
Michael: I've seen this.

307
00:17:13,200 --> 00:17:15,600
The other weird thing I've seen,
I don't know if you know the

308
00:17:15,600 --> 00:17:22,120
answer to, is you can do a simple
ID lookup, like primary key

309
00:17:22,120 --> 00:17:26,400
lookup and you will get planning
time, but no planning buffers

310
00:17:26,400 --> 00:17:28,760
reported, even if you include the
buffers thing.

311
00:17:29,760 --> 00:17:30,460
What's that?

312
00:17:30,460 --> 00:17:31,210
How's that?

313
00:17:31,210 --> 00:17:32,460
It's a bug I think.

314
00:17:32,620 --> 00:17:33,120
Yeah.

315
00:17:34,940 --> 00:17:38,540
Nikolay: Inside my team, like last
week, since we worked on this

316
00:17:38,680 --> 00:17:43,560
post, a couple of other… Many people
actually were involved,

317
00:17:44,060 --> 00:17:47,600
but this particular problem was
very annoying.

318
00:17:47,940 --> 00:17:48,640
Very annoying.

319
00:17:49,200 --> 00:17:52,480
So, like, why buffers are not reported
in planning?

320
00:17:52,500 --> 00:17:53,940
Okay, now they are reported.

321
00:17:53,940 --> 00:17:59,020
And we cannot understand what it
depends on.

322
00:17:59,640 --> 00:18:03,400
So, Once we understand, probably
it will be a bug report already,

323
00:18:03,400 --> 00:18:03,900
right?

324
00:18:04,220 --> 00:18:07,760
Because it's something wrong is
there, because we requested.

325
00:18:09,060 --> 00:18:11,820
Michael: It's weird, but I mean
the text format is hurting you

326
00:18:11,820 --> 00:18:17,180
there because by default in the
text format, if things are 0,

327
00:18:17,880 --> 00:18:20,900
they don't get reported, it's like
roughly a general rule.

328
00:18:20,900 --> 00:18:21,820
Nikolay: You think so?

329
00:18:21,820 --> 00:18:24,640
Michael: Whereas in JSON format,
you'd still get buffers reported,

330
00:18:24,640 --> 00:18:25,820
but they would have zeros.

331
00:18:25,920 --> 00:18:27,140
It's not much better.

332
00:18:27,180 --> 00:18:30,300
Nikolay: In our case, it should
be reported because it always

333
00:18:30,300 --> 00:18:32,340
was something like some number
of hits.

334
00:18:33,280 --> 00:18:33,960
But yeah.

335
00:18:33,960 --> 00:18:35,680
Michael: So that's a different
bug, right?

336
00:18:35,680 --> 00:18:38,140
Like all I meant is if planning
buffers

337
00:18:39,020 --> 00:18:39,620
Nikolay: are 0,

338
00:18:39,620 --> 00:18:42,100
Michael: I don't understand how
planning buffers can be 0 if

339
00:18:42,100 --> 00:18:45,300
it's like it was actually doing
work, if it's actually, you know,

340
00:18:45,300 --> 00:18:47,060
picking which index to scan.

341
00:18:47,440 --> 00:18:49,000
So I found that…

342
00:18:49,000 --> 00:18:53,200
Nikolay: – In the case of prepared,
cached plan, if plan is cached,

343
00:18:53,800 --> 00:18:57,180
there is still some planning time,
but I suspect this planning

344
00:18:57,180 --> 00:19:01,320
time occurs during execution because
of these subplans removed

345
00:19:01,320 --> 00:19:03,400
and partition pruning happening.

346
00:19:03,640 --> 00:19:04,960
I'm not sure, by the way.

347
00:19:04,960 --> 00:19:06,600
This is some nuance.

348
00:19:08,040 --> 00:19:10,680
Michael: I think maybe like
parsing might count as planning

349
00:19:10,680 --> 00:19:13,820
or like I think there's other steps.

350
00:19:14,060 --> 00:19:16,060
Nikolay: Well, I'm looking at
it right now.

351
00:19:16,060 --> 00:19:20,280
I'm looking at the plan which checks
execution of the forced

352
00:19:20,280 --> 00:19:21,160
generic plan.

353
00:19:21,500 --> 00:19:25,580
Plan cache mode set to forced generic
plan and we check execution

354
00:19:25,640 --> 00:19:26,860
for prepared statement.

355
00:19:27,380 --> 00:19:28,220
It's cached.

356
00:19:28,500 --> 00:19:29,480
The plan cached.

357
00:19:30,180 --> 00:19:34,700
We see no planning-related buffers
reported at all.

358
00:19:35,220 --> 00:19:36,000
It's missing.

359
00:19:36,340 --> 00:19:39,400
But planning time is like 38 microseconds.

360
00:19:40,600 --> 00:19:43,340
And partition pruning happened
at execution time.

361
00:19:43,460 --> 00:19:48,120
So I wonder what are those 38 microseconds.

362
00:19:49,440 --> 00:19:53,160
If plan is cached, maybe just additional
checks, I don't know.

363
00:19:53,840 --> 00:19:54,100
It's interesting, right?

364
00:19:54,100 --> 00:19:58,180
Michael: You know there's multiple
stages, like, you know, I

365
00:19:58,180 --> 00:20:00,980
can't remember all of them, but
1 of them, for example, is parsing.

366
00:20:01,000 --> 00:20:04,260
1 of them is like rewriting, so
like you might rewrite the query.

367
00:20:04,540 --> 00:20:07,540
I think some of those get bucketed
at the moment as planned.

368
00:20:07,540 --> 00:20:08,040
So

369
00:20:08,320 --> 00:20:11,840
Nikolay: probably you're right,
there are no buffer hits involved

370
00:20:11,840 --> 00:20:14,180
at all and that's why it's not
reported at all.

371
00:20:14,620 --> 00:20:19,760
Michael: But I've seen this, I've
seen 0 reported when it's like

372
00:20:19,760 --> 00:20:24,340
not a plan cache, like if it's
a primary lookup that you've

373
00:20:24,340 --> 00:20:24,840
run.

374
00:20:25,520 --> 00:20:28,600
Nikolay: So where there is some
work for sure, right?

375
00:20:28,700 --> 00:20:31,920
Michael: Yeah, or like in my head
it would make sense that there's

376
00:20:31,920 --> 00:20:32,420
work.

377
00:20:32,840 --> 00:20:34,900
Anyway, it's an interesting quirk.

378
00:20:35,660 --> 00:20:39,620
Nikolay: And even if it's not a
bug, the fact that you raise

379
00:20:39,620 --> 00:20:43,540
this topic and we discuss this
topic inside my team, it's already

380
00:20:43,580 --> 00:20:46,060
something, right, which is not
clear.

381
00:20:46,620 --> 00:20:52,220
And yeah, If someone who watches
us who is more experienced than

382
00:20:52,220 --> 00:20:53,500
us, please tell us.

383
00:20:53,500 --> 00:20:56,540
This is super interesting piece
of information.

384
00:20:58,580 --> 00:21:03,220
So what I wanted to say, You need
to check the execution of explain

385
00:21:03,360 --> 00:21:07,480
for planning for the second time
inside the same session just

386
00:21:07,480 --> 00:21:11,580
to exclude the fact that you observed
overhead from lack of real

387
00:21:11,580 --> 00:21:11,820
cache.

388
00:21:11,820 --> 00:21:14,360
This is lesson I learned last week.

389
00:21:15,240 --> 00:21:20,980
This is correction to ideas I expressed
a few weeks ago in the

390
00:21:20,980 --> 00:21:22,160
planning time episode.

391
00:21:23,100 --> 00:21:27,040
Michael: And there's some good
practices that help mitigate

392
00:21:27,040 --> 00:21:28,300
this kind of issue anyway.

393
00:21:28,480 --> 00:21:32,440
Normally people when they're benchmarking
or testing things will

394
00:21:32,440 --> 00:21:36,060
run something 5 times and maybe
take the best of the 5 or you

395
00:21:36,060 --> 00:21:40,280
know running things multiple times I don't know if you've ever

396
00:21:40,280 --> 00:21:45,040
seen some sports do like diving for example they give people

397
00:21:45,040 --> 00:21:48,940
scores out of 10 and then they cross out the top scoring judge

398
00:21:48,940 --> 00:21:50,440
and the bottom scoring judge.

399
00:21:51,100 --> 00:21:52,480
And yeah, actually,

400
00:21:52,780 --> 00:21:53,800
Nikolay: we did that.

401
00:21:53,860 --> 00:21:59,940
We did it, but we like, due to some reasons, it was not like

402
00:22:00,040 --> 00:22:03,380
We plan to improve this and this kind of analysis should be done

403
00:22:03,580 --> 00:22:04,640
exactly that way.

404
00:22:04,640 --> 00:22:10,440
You run multiple times, you check which data points are way off

405
00:22:10,440 --> 00:22:12,440
and exclude them and then you take average.

406
00:22:12,440 --> 00:22:13,720
That makes sense totally.

407
00:22:14,100 --> 00:22:18,400
But in this case, just how we did it, It was a single session

408
00:22:18,400 --> 00:22:20,780
experiment and we ran all EXPLAINs.

409
00:22:21,280 --> 00:22:25,920
So bottom point, if you run EXPLAIN 5 times and every time it's

410
00:22:25,920 --> 00:22:30,860
a new connection, the problem persists.

411
00:22:32,380 --> 00:22:35,240
You need to run it multiple times inside the same connection.

412
00:22:37,120 --> 00:22:42,260
And then I would not agree with excluding the first 1 due to

413
00:22:42,260 --> 00:22:45,260
lack of real cache, because it's also an interesting fact that

414
00:22:45,660 --> 00:22:48,900
With growing number of PARTITIONs, the very first execution,

415
00:22:49,200 --> 00:22:53,600
it can reach dozens of milliseconds and even hundreds of milliseconds.

416
00:22:53,600 --> 00:22:54,440
It's a lot.

417
00:22:55,080 --> 00:22:59,940
It means connection poolers are needed and they need to be configured

418
00:23:00,040 --> 00:23:01,860
properly if you have thousands of PARTITIONs.

419
00:23:02,400 --> 00:23:07,220
Otherwise, it will be very often new connection established,

420
00:23:07,500 --> 00:23:10,760
could execute the first time, huge number of PARTITIONs, and

421
00:23:10,760 --> 00:23:13,940
you have penalty of dozens of milliseconds for heavily loaded

422
00:23:13,940 --> 00:23:14,440
projects.

423
00:23:14,440 --> 00:23:16,420
It's a lot, maybe 100 milliseconds.

424
00:23:16,640 --> 00:23:18,360
It's like huge overhead.

425
00:23:19,400 --> 00:23:21,240
This fact also interesting, right?

426
00:23:21,300 --> 00:23:22,500
We learned 2 facts.

427
00:23:22,780 --> 00:23:26,900
Overhead is huge, linearly growing for this particular simple,

428
00:23:26,980 --> 00:23:28,280
very trivial case.

429
00:23:29,140 --> 00:23:32,060
There is basically no overhead for subsequent queries.

430
00:23:32,160 --> 00:23:32,660
Wow.

431
00:23:33,120 --> 00:23:37,280
This is like a sequence of surprises for me in this work.

432
00:23:37,640 --> 00:23:38,680
I like it actually.

433
00:23:40,080 --> 00:23:43,220
Let's maybe switch to another topic.

434
00:23:44,640 --> 00:23:46,920
Maybe you have some ideas what to discuss next?

435
00:23:46,920 --> 00:23:50,740
Michael: Yeah, well when I think of advanced EXPLAIN, I'm thinking

436
00:23:50,740 --> 00:23:54,120
more like advanced query optimisation.

437
00:23:54,900 --> 00:23:59,700
Maybe you're familiar and got good at reading EXPLAIN plans when

438
00:23:59,700 --> 00:24:05,020
it's a relatively simple query, maybe some normal scan types,

439
00:24:05,020 --> 00:24:06,820
some normal JOIN algorithms.

440
00:24:08,140 --> 00:24:11,520
You're somewhat familiar with it, you can work your way around

441
00:24:11,520 --> 00:24:11,840
that.

442
00:24:11,840 --> 00:24:16,080
But I've spent several years looking at this and I still from

443
00:24:16,080 --> 00:24:20,400
time to time see operation types that I've never seen before.

444
00:24:20,740 --> 00:24:24,880
And things get complex like in just in terms of how many operations

445
00:24:24,900 --> 00:24:29,020
there are or things that complicate
things like CTEs and in it

446
00:24:29,020 --> 00:24:33,960
plans and kind of things happening,
which order things happen

447
00:24:33,960 --> 00:24:36,300
in, things happening at the same
time.

448
00:24:36,500 --> 00:24:40,460
I think we had Haki Benita on relatively
recently talking about

449
00:24:40,460 --> 00:24:43,820
how if you have multiple CTEs,
those happen at the same time

450
00:24:43,820 --> 00:24:45,480
and independently of each other.

451
00:24:45,480 --> 00:24:47,480
So like, that's confusing.

452
00:24:47,980 --> 00:24:52,540
And like, how those appear and
explain is interesting.

453
00:24:53,200 --> 00:24:58,340
But because it's shown as a tree
and they have to kind of work

454
00:24:58,340 --> 00:25:01,880
out different display mechanisms
for how, you know, for showing

455
00:25:01,880 --> 00:25:04,760
that that's not necessarily misleading
because it's hard to say

456
00:25:04,760 --> 00:25:06,440
how you should visualize that.

457
00:25:06,820 --> 00:25:10,060
But that's what I'm thinking in
terms of like advanced topics

458
00:25:10,200 --> 00:25:11,000
around explain.

459
00:25:11,000 --> 00:25:12,540
How do you interpret those?

460
00:25:12,540 --> 00:25:14,920
And also, what can you ignore?

461
00:25:15,600 --> 00:25:19,540
If you want to do this, normally
your aim is, can I speed this

462
00:25:19,540 --> 00:25:20,040
up?

463
00:25:21,200 --> 00:25:24,620
And as things get complex, you
often want to simplify.

464
00:25:24,620 --> 00:25:28,520
Like, what can I do to discard
a bunch of this information?

465
00:25:28,520 --> 00:25:31,020
Like, which bits of this are already
fast that I can ignore?

466
00:25:31,060 --> 00:25:34,280
How do I, maybe I can run a simpler
query that's only the part

467
00:25:34,280 --> 00:25:36,800
of the plan that's problematic.

468
00:25:37,360 --> 00:25:40,100
That's what I'm thinking in terms
of more complex topics.

469
00:25:40,520 --> 00:25:41,620
Nikolay: Right, I agree.

470
00:25:41,720 --> 00:25:43,060
Reading complex plans.

471
00:25:45,040 --> 00:25:45,480
This is it.

472
00:25:45,480 --> 00:25:46,220
Yeah, I agree.

473
00:25:47,780 --> 00:25:50,520
Here actually what helps?

474
00:25:50,740 --> 00:25:52,280
Experience helps definitely.

475
00:25:53,890 --> 00:25:58,480
I usually like buffers for me as
a must.

476
00:25:58,580 --> 00:26:00,100
Buffers is a must.

477
00:26:00,320 --> 00:26:05,440
First, I start reading execution
time, planning time, but I read

478
00:26:05,440 --> 00:26:09,960
quickly scan buffers and they are
like cumulative, like they

479
00:26:09,960 --> 00:26:12,100
are accumulated to the root.

480
00:26:13,100 --> 00:26:14,360
Root includes everything.

481
00:26:15,100 --> 00:26:19,100
So you can see and go from root
to leaves basically.

482
00:26:21,040 --> 00:26:23,140
It's like upside down tree, right?

483
00:26:23,680 --> 00:26:26,180
So we top down analysis again.

484
00:26:27,080 --> 00:26:30,820
Okay, we understand the number
of buffers, understand how many

485
00:26:31,620 --> 00:26:35,600
megabytes or even gigabytes sometimes
it is if you multiply by

486
00:26:35,600 --> 00:26:40,100
the block size, which is in most
cases 8 kibibytes.

487
00:26:40,920 --> 00:26:45,820
Then we go down and see where exactly
this huge volume came from.

488
00:26:46,560 --> 00:26:51,600
This is my default type of work
with plans, including execution,

489
00:26:51,760 --> 00:26:52,400
of course.

490
00:26:52,680 --> 00:26:55,680
Because if you don't have execution,
you only have cost, and

491
00:26:55,680 --> 00:27:00,980
buffers are reported only for planning,
as we just discussed.

492
00:27:02,800 --> 00:27:04,740
In this case, usually it helps.

493
00:27:04,740 --> 00:27:06,140
In most cases it helps.

494
00:27:07,000 --> 00:27:13,700
And I wish I also was able to see
lock-related information, right?

495
00:27:13,700 --> 00:27:14,780
Mixed plane, plane.

496
00:27:15,250 --> 00:27:15,750
Michael: T.

497
00:27:16,220 --> 00:27:16,260
Yeah.

498
00:27:16,260 --> 00:27:17,000
Like weights.

499
00:27:17,380 --> 00:27:17,880
Yeah.

500
00:27:18,220 --> 00:27:20,940
It's difficult to see how they
would do that, but...

501
00:27:22,040 --> 00:27:23,440
Nikolay: Not necessarily weights.

502
00:27:23,480 --> 00:27:24,740
Michael: What do you mean by locks?

503
00:27:26,800 --> 00:27:28,680
Nikolay: If we...

504
00:27:28,860 --> 00:27:30,720
Yeah, yeah, actually maybe you're
right.

505
00:27:30,720 --> 00:27:34,120
Yeah, what's happening under the
hood slightly, but it's already

506
00:27:34,120 --> 00:27:36,840
super complicated output, right?

507
00:27:37,460 --> 00:27:37,640
So...

508
00:27:37,640 --> 00:27:41,920
Michael: Yeah, and there's macro
versus micro issues here.

509
00:27:41,920 --> 00:27:45,480
If you're on a test environment
where you're the only person

510
00:27:45,480 --> 00:27:48,780
there, it's much less likely that
you're going to be incorrect.

511
00:27:50,340 --> 00:27:56,020
Nikolay: When I said locks,
I was meaning heavy locks.

512
00:27:58,120 --> 00:28:04,760
So at planning time, all indexes
are locked, And all tables and

513
00:28:04,760 --> 00:28:06,060
all relations are locked.

514
00:28:06,060 --> 00:28:07,800
And that's a lot of overhead.

515
00:28:07,900 --> 00:28:09,380
I wish I saw it.

516
00:28:09,380 --> 00:28:10,580
It's interesting information.

517
00:28:10,640 --> 00:28:12,380
Accessory lock for all of them.

518
00:28:13,140 --> 00:28:17,900
So, and if it's update, okay, what
kind of lock I have here?

519
00:28:17,900 --> 00:28:18,680
Michael: Oh, interesting.

520
00:28:18,840 --> 00:28:20,880
Yeah, that's not what I was thinking
at all.

521
00:28:21,300 --> 00:28:23,580
Nikolay: Yeah, you thought about
lightweight logs.

522
00:28:24,280 --> 00:28:25,340
Michael: Well, no, I was...

523
00:28:25,600 --> 00:28:26,400
Oh, even a different...

524
00:28:26,400 --> 00:28:27,260
Or weight events.

525
00:28:27,980 --> 00:28:30,380
No, I was thinking, like, if you...

526
00:28:30,860 --> 00:28:36,920
Easiest experiment to show this
is start 1 transaction and maybe

527
00:28:37,440 --> 00:28:42,840
do some DDL or like yeah drop a
column or something yeah anything

528
00:28:43,100 --> 00:28:47,660
that's doing well add a huge like
a really heavy lock on that

529
00:28:47,660 --> 00:28:51,560
table, then open a second transaction,
just do a simple SELECT

530
00:28:51,680 --> 00:28:56,620
query on that table, go back to
the first transaction and commit,

531
00:28:57,180 --> 00:29:00,860
go back to the second 1 and see
that your query was just waiting

532
00:29:00,860 --> 00:29:01,560
that entire time.

533
00:29:01,560 --> 00:29:03,680
If you run that query with EXPLAIN
in that

534
00:29:03,680 --> 00:29:04,540
Nikolay: second session...

535
00:29:04,540 --> 00:29:06,588
If you commit it, it's not waiting

536
00:29:06,588 --> 00:29:06,900
Michael: anymore.

537
00:29:06,900 --> 00:29:09,340
But the execution time of that
EXPLAIN will be huge.

538
00:29:09,340 --> 00:29:11,820
Nikolay: Oh yeah, yeah, yeah, yeah,
yeah, yeah, yeah, you're

539
00:29:11,820 --> 00:29:12,280
right.

540
00:29:12,280 --> 00:29:16,860
And this execution time can be
explained if we saw basically

541
00:29:16,860 --> 00:29:20,160
a pie chart or something, or a
distribution of wait

542
00:29:22,820 --> 00:29:23,320
Michael: elements.

543
00:29:23,560 --> 00:29:27,540
There will be no node, there'll
be no operation in that EXPLAIN

544
00:29:27,540 --> 00:29:30,560
plan that shows that it was waiting
on an exclusive lot.

545
00:29:30,560 --> 00:29:34,080
It will just show that 1 of the
operations was extremely slow.

546
00:29:34,200 --> 00:29:36,360
Nikolay: Simple wait event
analysis for this particular

547
00:29:36,420 --> 00:29:37,440
session would help.

548
00:29:37,440 --> 00:29:42,420
Michael: Yes, but it doesn't
come up that often to me.

549
00:29:43,780 --> 00:29:47,360
Maybe it's 1 of the blind spots
because I don't see those, but

550
00:29:47,360 --> 00:29:51,300
I don't see plans that have these
really unexpected bottlenecks.

551
00:29:51,760 --> 00:29:54,900
Nikolay: I would
like to be able to, not by default,

552
00:29:54,900 --> 00:29:58,940
but I would like to be able to
see these wait event analysis and

553
00:29:58,940 --> 00:30:02,140
heavy locks as well because I want
to understand this query,

554
00:30:02,780 --> 00:30:06,560
which locks it's going to acquire
when I go production.

555
00:30:08,760 --> 00:30:12,040
When execution finished, what locks
were acquired?

556
00:30:13,820 --> 00:30:17,920
I think interesting extensions,
but it's already so huge.

557
00:30:19,860 --> 00:30:22,440
Michael: And as I mentioned, we're
getting more and more, I think

558
00:30:22,440 --> 00:30:24,880
this is part of the reason people
are adding more parameters

559
00:30:24,960 --> 00:30:27,840
to explain, so write-ahead logging
information.

560
00:30:28,320 --> 00:30:30,520
Nikolay: It should not be by default,
it's too much.

561
00:30:30,580 --> 00:30:33,160
Michael: Well, I actually have
a proposal here.

562
00:30:33,160 --> 00:30:36,140
I haven't suggested it on any of
the mailing lists yet, but I

563
00:30:36,140 --> 00:30:37,260
would like your opinion.

564
00:30:38,200 --> 00:30:42,780
I thought about adding them by
default when you include the parameter

565
00:30:42,880 --> 00:30:43,380
verbose.

566
00:30:44,160 --> 00:30:48,080
So if you do explain, analyze verbose,
give me everything.

567
00:30:49,000 --> 00:30:51,360
Nikolay: You just don't want to
update your website

568
00:30:51,980 --> 00:30:52,440
Michael: snippet.

569
00:30:52,440 --> 00:30:55,960
Yes, yeah, I don't want to update
that snippet for sure, but

570
00:30:55,960 --> 00:31:00,040
also I want people to only have
to remember to do that and they'll

571
00:31:00,040 --> 00:31:02,260
get whatever Postgres can give
them.

572
00:31:02,400 --> 00:31:05,720
Verbose means really long.

573
00:31:06,600 --> 00:31:07,840
It means everything.

574
00:31:07,840 --> 00:31:11,640
It kind of means like, I'm okay
with loads of information, give

575
00:31:11,640 --> 00:31:12,180
me it all.

576
00:31:12,180 --> 00:31:14,180
So it's already kind of in the
meaning.

577
00:31:15,480 --> 00:31:18,540
Anyway, I haven't proposed it yet,
but that would also come with

578
00:31:18,540 --> 00:31:20,460
buffers, so I thought you might
be happy.

579
00:31:20,460 --> 00:31:20,820
Nikolay: Okay.

580
00:31:20,820 --> 00:31:24,060
And also, if you talk about what's
missing and what you cannot

581
00:31:24,060 --> 00:31:27,400
get from EXPLAIN, I also like at
some point, maybe some extension

582
00:31:27,400 --> 00:31:32,120
of this could be, imagine if we
see operations at disk level

583
00:31:32,120 --> 00:31:32,800
as well.

584
00:31:33,060 --> 00:31:36,760
And we understand not only hits
and reads for the buffer pool

585
00:31:36,760 --> 00:31:40,760
but also hits and reads for page
cache for example and maybe

586
00:31:40,760 --> 00:31:45,100
CPU analysis as well for this particular
execution if it is explained

587
00:31:45,100 --> 00:31:45,780
and analyzed.

588
00:31:46,500 --> 00:31:46,780
Michael: Right?

589
00:31:46,780 --> 00:31:47,540
That's a good point.

590
00:31:47,540 --> 00:31:49,740
It's not exactly what you're talking
about but loads of people

591
00:31:49,740 --> 00:31:52,200
don't realize because it's not
an EXPLAIN parameter that you

592
00:31:52,200 --> 00:31:56,360
can, with track_io_timing on, you
can start to get some insights

593
00:31:56,540 --> 00:32:02,700
into the disk level performance
and that's improving in each,

594
00:32:03,080 --> 00:32:07,360
Like there's recent, even in 17
we got some improvements there.

595
00:32:07,920 --> 00:32:11,080
Like buffers are split into shared, local and temp.

596
00:32:11,260 --> 00:32:16,400
IOTiming wasn't initially and is now split into all 3.

597
00:32:16,400 --> 00:32:17,420
Nikolay: Yeah, that's great.

598
00:32:17,680 --> 00:32:20,440
But I'm talking about like basically pg_stat_kcache, but for

599
00:32:20,440 --> 00:32:20,940
EXPLAIN.

600
00:32:21,460 --> 00:32:24,220
Michael: I understand, but the timings are a good start, right?

601
00:32:24,680 --> 00:32:25,320
Nikolay: I agree.

602
00:32:25,460 --> 00:32:25,920
Yeah.

603
00:32:25,920 --> 00:32:29,240
And actually, I think it's possible to have all of this right

604
00:32:29,240 --> 00:32:32,780
now already, but you need to create, like, connect many pieces.

605
00:32:33,620 --> 00:32:37,300
And for example, it could be a tool which runs EXPLAIN, but also

606
00:32:37,300 --> 00:32:41,620
looks at wait event analysis for this pid in pg_stat_activity

607
00:32:41,780 --> 00:32:43,400
and locks as well.

608
00:32:43,620 --> 00:32:44,480
And what else?

609
00:32:44,760 --> 00:32:47,820
It collects everything and big report for a particular query.

610
00:32:48,680 --> 00:32:49,540
And also physical.

611
00:32:49,540 --> 00:32:53,000
Physical, we can, if it's, we have physical access, we can do

612
00:32:53,000 --> 00:32:53,840
it from proc.

613
00:32:54,960 --> 00:32:59,120
Knowing PID, this I think we had prototype for our old bot.

614
00:33:00,020 --> 00:33:03,700
So it's like you just, counters, counters from proc.

615
00:33:04,540 --> 00:33:06,920
Michael: You can get, because if you're the query identifier,

616
00:33:07,200 --> 00:33:11,500
which is the normalized 1 that you can match to like pg_stat_statements,

617
00:33:11,640 --> 00:33:12,040
I'm imagining.

618
00:33:12,040 --> 00:33:13,360
This also, yeah.

619
00:33:13,360 --> 00:33:16,680
Yeah, you could start to like tie these things, all of these

620
00:33:16,680 --> 00:33:17,460
things together.

621
00:33:17,640 --> 00:33:20,040
Nikolay: Yeah, a lot of stuff is possible.

622
00:33:20,280 --> 00:33:22,100
Who's doing this, by the way?

623
00:33:22,120 --> 00:33:23,660
Maybe someone is doing this.

624
00:33:23,920 --> 00:33:26,840
Michael: I don't know the monitoring tools well enough to know

625
00:33:26,840 --> 00:33:32,460
who's doing some of this micro level, but if they are, let us

626
00:33:32,460 --> 00:33:32,640
know.

627
00:33:32,640 --> 00:33:35,580
Like if you're using 1 or you if you're building 1 that does

628
00:33:35,580 --> 00:33:36,920
that would be cool to hear.

629
00:33:37,300 --> 00:33:39,600
To be honest though, it goes back to some of the conversations

630
00:33:39,600 --> 00:33:44,640
we've had about they're like macro to micro zoom, like once you've

631
00:33:44,640 --> 00:33:49,320
done your macro analysis and you've identified a few problematic

632
00:33:49,460 --> 00:33:53,000
query, like the main consumer, if you've got a few bottlenecks,

633
00:33:53,940 --> 00:33:57,740
EXPLAIN is a really good tool for doing that micro, like once

634
00:33:57,740 --> 00:34:01,880
you've got some sensible set of parameters that you know will

635
00:34:01,880 --> 00:34:06,940
be slow, it's not that hard to reproduce normally.

636
00:34:07,360 --> 00:34:11,020
The difficult thing is then going through a 200 line explain

637
00:34:11,100 --> 00:34:11,600
plan.

638
00:34:11,980 --> 00:34:13,240
You asked what can help.

639
00:34:13,940 --> 00:34:16,920
I think we've put a lot of effort into helping with that stage

640
00:34:16,920 --> 00:34:19,400
and a lot of the other tool providers have as well.

641
00:34:19,440 --> 00:34:24,480
1 thing we do that I think is really helpful is if a sub plan

642
00:34:24,480 --> 00:34:28,580
is fast like as a proportion of your query, we collapse it by

643
00:34:28,580 --> 00:34:28,940
default.

644
00:34:28,940 --> 00:34:30,400
We hide like we don't hide it.

645
00:34:30,400 --> 00:34:33,060
We just say there's like there's
20 nodes here.

646
00:34:34,760 --> 00:34:37,060
Probably you don't want to look
here, which means that like a

647
00:34:37,060 --> 00:34:39,840
300 line plan can go down to like
20.

648
00:34:40,640 --> 00:34:41,400
Nikolay: That's cool.

649
00:34:41,480 --> 00:34:44,480
Michael: But that like those kinds
of things I think can really

650
00:34:44,480 --> 00:34:45,900
help initially.

651
00:34:46,080 --> 00:34:47,640
Like not it's not good for a hacker.

652
00:34:47,640 --> 00:34:49,000
It's not good for a Postgres hacker.

653
00:34:49,000 --> 00:34:52,700
It's like trying to work out the
cost model for a new feature.

654
00:34:53,100 --> 00:34:56,480
But for somebody just trying to
speed up a single query, you

655
00:34:56,480 --> 00:35:00,020
don't need to look at the 200 partitions
that all took 0, or

656
00:35:00,020 --> 00:35:02,880
that were never executed, or that
took 0 milliseconds.

657
00:35:03,660 --> 00:35:04,020
Nikolay: Yeah.

658
00:35:04,020 --> 00:35:08,980
And you mentioned query ID and
for example, like I think it was

659
00:35:08,980 --> 00:35:12,340
last week we were working with
1 of the customers, we saw that

660
00:35:12,340 --> 00:35:14,440
auto_explain was used and it's great.

661
00:35:15,400 --> 00:35:19,660
And There was a task to analyze
a lot, and we still plan it.

662
00:35:20,340 --> 00:35:25,380
I'm very thankful to you that we
have integration in our AI right

663
00:35:25,380 --> 00:35:27,760
now, and your system helps our
system.

664
00:35:28,320 --> 00:35:32,220
I plan to perform some massive
analysis at larger scale to be

665
00:35:32,220 --> 00:35:33,300
able to do it.

666
00:35:33,340 --> 00:35:37,580
But the interesting thing is that
if you want to analyze a lot

667
00:35:37,580 --> 00:35:40,020
of queries from auto_explain, you
do need some...

668
00:35:40,200 --> 00:35:41,880
You basically need to compute...

669
00:35:42,560 --> 00:35:45,960
It's called compute_query_id parameter,
right?

670
00:35:45,960 --> 00:35:50,100
You need to turn it on because
by default it's auto and for auto_explain

671
00:35:50,280 --> 00:35:51,340
it will not work.

672
00:35:52,660 --> 00:35:54,640
Michael: It does if you have
the boson.

673
00:35:55,240 --> 00:35:56,820
Nikolay: Okay, in auto_explain.

674
00:35:58,320 --> 00:36:02,340
Okay, in our case we didn't have
the boson, I think.

675
00:36:03,100 --> 00:36:06,220
Michael: But only as of 16, I think.

676
00:36:07,660 --> 00:36:09,860
Nikolay: It's better to turn it
on, actually.

677
00:36:10,080 --> 00:36:10,900
Michael: Maybe, yeah.

678
00:36:12,040 --> 00:36:14,480
Nikolay: The question of overhead,
and let's, by the way, talk

679
00:36:14,480 --> 00:36:16,040
about overhead in a second.

680
00:36:16,160 --> 00:36:24,100
But Imagine you have a lot of query
cases, various queries from

681
00:36:24,100 --> 00:36:24,780
the log.

682
00:36:26,060 --> 00:36:30,260
Presence of query ID first gives
you understanding how many different

683
00:36:30,380 --> 00:36:33,760
normalized or aggregated queries
you are dealing with.

684
00:36:34,280 --> 00:36:39,520
Second, you can then join this
information with pg_stat_statements

685
00:36:39,520 --> 00:36:43,220
and go to macro level back and
understand, okay, this query is

686
00:36:43,220 --> 00:36:47,200
not very influential because it's
a tiny part of or maybe vice

687
00:36:47,200 --> 00:36:50,460
versa, it's influential although
we see only a few occurrences

688
00:36:50,660 --> 00:36:54,180
in our slow log because we have
threshold by time.

689
00:36:55,080 --> 00:36:58,880
auto_explain mean duration, query
mean duration.

690
00:37:00,060 --> 00:37:02,700
We usually set it to 1 second,
for example.

691
00:37:03,040 --> 00:37:06,220
If we have only a few cases, but
we see in the workload, oh,

692
00:37:06,220 --> 00:37:11,320
this is actually 50% of all total
exact time, oh, we do need

693
00:37:11,320 --> 00:37:14,720
to think about it properly, and
influence of this optimization

694
00:37:14,820 --> 00:37:15,780
will be high.

695
00:37:16,500 --> 00:37:17,380
Let's do it.

696
00:37:17,560 --> 00:37:20,440
This helps understand when you
have a lot of queries to deal

697
00:37:20,440 --> 00:37:22,360
with, where to start.

698
00:37:22,360 --> 00:37:24,900
You want to start with most influential
steps.

699
00:37:26,740 --> 00:37:29,160
So query ID is a good thing.

700
00:37:29,440 --> 00:37:33,900
I'm glad we have it everywhere
in EXPLAIN, in the logs, in

701
00:37:33,900 --> 00:37:35,880
pg_stat_statements, you can connect all
the dots.

702
00:37:36,180 --> 00:37:38,160
And pg_stat_activity as well, right?

703
00:37:38,860 --> 00:37:42,500
So wait event analysis is possible
just joining by query ID as

704
00:37:42,500 --> 00:37:46,360
well, if you run many, many times,
right?

705
00:37:46,740 --> 00:37:47,720
So, yeah, great.

706
00:37:47,720 --> 00:37:48,720
What about overhead?

707
00:37:48,740 --> 00:37:50,540
I think it's also an advanced topic.

708
00:37:50,540 --> 00:37:53,500
I know you blog posted about it
and OnGres did.

709
00:37:53,900 --> 00:37:56,310
You talked about auto_explain,
right?

710
00:37:56,540 --> 00:37:58,760
But Explain itself also has overhead.

711
00:37:59,360 --> 00:38:05,740
Michael: Well, Explain can have
both, can under-report and can

712
00:38:05,740 --> 00:38:07,580
over-report the timing.

713
00:38:07,860 --> 00:38:09,660
Nikolay: Django, I'm not a fan
of Django.

714
00:38:10,240 --> 00:38:12,140
Michael: Well, but bear with me.

715
00:38:13,340 --> 00:38:15,360
I think people focus a lot...

716
00:38:17,060 --> 00:38:18,060
There's different things.

717
00:38:18,060 --> 00:38:21,980
The OnGres thing was about the
queries being slower, like you

718
00:38:21,980 --> 00:38:26,680
actually cause some overhead by
observing and that's true when

719
00:38:26,680 --> 00:38:29,600
you're running EXPLAIN ANALYZE
with time because timing is on

720
00:38:29,600 --> 00:38:34,240
with Analyze by default But it's
also true if you're running

721
00:38:34,240 --> 00:38:38,100
auto_explain, like there's also
some overhead to measuring and

722
00:38:38,100 --> 00:38:42,180
to logging, like if you log a lot
of plans that's some overhead

723
00:38:42,180 --> 00:38:42,940
as well.

724
00:38:43,380 --> 00:38:49,320
So there's those overheads but
there's also the possibility that

725
00:38:49,320 --> 00:38:50,600
it under reports which I

726
00:38:50,600 --> 00:38:51,100
Nikolay: think

727
00:38:51,180 --> 00:38:51,960
Michael: throws some.

728
00:38:53,260 --> 00:38:56,260
So this is 1 of the reasons they
added the parameter serialize.

729
00:38:56,960 --> 00:39:02,300
If you are doing a lot of work
after the query is executed but

730
00:39:02,300 --> 00:39:05,100
in order to send the data to the
client.

731
00:39:08,120 --> 00:39:10,500
Nobody really cares about the server-side
time.

732
00:39:12,040 --> 00:39:14,340
The server-side time is important,
that's what we get from EXPLAIN

733
00:39:14,340 --> 00:39:17,380
ANALYZE, but what they normally
care about is the user experience,

734
00:39:18,160 --> 00:39:23,080
like the full client to server
to client time.

735
00:39:23,500 --> 00:39:25,900
Nikolay: And- I cannot agree here,
but okay, go on.

736
00:39:26,640 --> 00:39:32,060
Michael: Okay, well, what I mean
is, If you're getting a report

737
00:39:32,120 --> 00:39:36,820
for my dashboard is slow, there's
no point only speeding up the

738
00:39:36,820 --> 00:39:37,800
server side time.

739
00:39:37,800 --> 00:39:43,200
If you're sending huge amounts
of data, and part of that work

740
00:39:43,200 --> 00:39:47,440
is, so serialize is part of it,
It's getting the data ready to

741
00:39:47,440 --> 00:39:47,840
transmit.

742
00:39:47,840 --> 00:39:51,580
And then there's the network overhead
as well, like of actually

743
00:39:51,580 --> 00:39:52,840
sending all that data.

744
00:39:53,400 --> 00:39:56,760
So, yeah, I guess you're right.

745
00:39:56,920 --> 00:39:58,500
Nikolay: It's not network cost.

746
00:39:58,500 --> 00:40:00,060
It's not like transmission cost.

747
00:40:00,060 --> 00:40:05,080
It's a cost of, not cost, cost
may be not a good word, it's overloaded

748
00:40:05,080 --> 00:40:05,780
here, right?

749
00:40:06,900 --> 00:40:07,900
It's time, right?

750
00:40:07,900 --> 00:40:10,080
Or cost, or it's cost, it's cost

751
00:40:10,080 --> 00:40:11,080
Michael: actually, right?

752
00:40:11,200 --> 00:40:16,200
Serialized is reported as time
and amount of data.

753
00:40:16,840 --> 00:40:17,640
Nikolay: What about cost?

754
00:40:18,140 --> 00:40:18,640
Okay.

755
00:40:19,200 --> 00:40:21,660
Michael: I actually don't think
it has cost to relate.

756
00:40:21,660 --> 00:40:25,520
I think it's 1 of those things
that gets reported in the summary

757
00:40:25,520 --> 00:40:25,960
section.

758
00:40:25,960 --> 00:40:29,160
So, just-in-time compilation and
planning.

759
00:40:29,380 --> 00:40:32,035
Nikolay: It's about civilization,
not transmitting.

760
00:40:32,470 --> 00:40:36,300
Michael: Yes, It's a step before
getting the data ready to transmit.

761
00:40:36,980 --> 00:40:38,800
Nikolay: It's still everything
on the server.

762
00:40:39,820 --> 00:40:45,320
Michael: For example, uncompressing
or decompressing TOASTed data.

763
00:40:45,360 --> 00:40:48,220
Nikolay: And it's Postgres 17 feature,
right?

764
00:40:49,600 --> 00:40:55,760
So it just goes deeper to understand
what's happening on server

765
00:40:55,760 --> 00:41:01,040
side, like execution and what part
of it is associated with civilization,

766
00:41:01,420 --> 00:41:03,480
attributed to civilization, right?

767
00:41:03,920 --> 00:41:05,520
Michael: Well, and not just
that.

768
00:41:05,900 --> 00:41:11,100
Before, if you ran EXPLAIN ANALYZE,
SELECT some data that had

769
00:41:11,100 --> 00:41:16,660
been TOASTed from this table, you
would get a really fast execution

770
00:41:16,780 --> 00:41:18,780
plan because it wouldn't do the
serialization.

771
00:41:19,040 --> 00:41:21,100
It would say this executes really
quickly.

772
00:41:21,340 --> 00:41:24,880
So it wasn't just that you couldn't
see the time, it was like

773
00:41:24,880 --> 00:41:29,140
a query that might take 100 milliseconds
when you run it from

774
00:41:29,140 --> 00:41:32,580
psql, if you put EXPLAIN ANALYZE
in front of it, it could run

775
00:41:32,580 --> 00:41:35,700
in 1 millisecond or 10 milliseconds
much much much faster.

776
00:41:36,220 --> 00:41:39,520
So because it didn't have to do
that expensive step.

777
00:41:42,380 --> 00:41:46,400
So this is 1 of those cases that
used to be and not just used

778
00:41:46,400 --> 00:41:49,640
to be like by default this will
still be, this is still the behavior

779
00:41:49,640 --> 00:41:50,540
in version 17.

780
00:41:50,540 --> 00:41:52,560
If you want to explain, analyze
some basics.

781
00:41:52,560 --> 00:41:57,580
Nikolay: SELECT star from blah
blah blah and before 17 we didn't

782
00:41:57,580 --> 00:41:58,860
see this part at all.

783
00:41:59,440 --> 00:42:02,980
Michael: Even in 17 you don't see
it unless you also explicitly

784
00:42:03,460 --> 00:42:05,580
ask EXPLAIN ANALYZE serial.

785
00:42:06,820 --> 00:42:08,260
That's what I've been saying.

786
00:42:09,400 --> 00:42:10,740
Nikolay: And what about verbose?

787
00:42:11,260 --> 00:42:12,680
Verbose includes it, right?

788
00:42:13,440 --> 00:42:14,740
Michael: No it doesn't.

789
00:42:16,720 --> 00:42:19,840
But this is my proposal that verbose
should include all of these

790
00:42:19,840 --> 00:42:20,940
additional parameters.

791
00:42:21,100 --> 00:42:22,620
Nikolay: It should, of course, I agree.

792
00:42:22,940 --> 00:42:23,800
Well, okay.

793
00:42:25,080 --> 00:42:26,920
It is as it is right now.

794
00:42:27,800 --> 00:42:31,100
Michael: And I have to actually say, EXPLAIN is awesome.

795
00:42:31,120 --> 00:42:35,280
EXPLAIN in Postgres in particular is the best I've seen of any

796
00:42:35,280 --> 00:42:36,980
Database that I've worked with.

797
00:42:37,540 --> 00:42:41,180
Other data, like MySQL for example, is catching up.

798
00:42:41,440 --> 00:42:46,220
They got EXPLAIN ANALYZE in version 8, but the amount of detail

799
00:42:46,220 --> 00:42:50,380
we can already get is way, like, it's beyond the other open source

800
00:42:50,380 --> 00:42:51,500
Databases I've seen.

801
00:42:51,500 --> 00:42:54,640
So it is good, it's just always can be better, right?

802
00:42:54,860 --> 00:42:56,020
Nikolay: Yeah, yeah, yeah.

803
00:42:56,060 --> 00:43:00,200
If somebody is watching us thinking, oh, a couple of haters,

804
00:43:00,200 --> 00:43:00,700
right?

805
00:43:01,100 --> 00:43:02,180
Haters of EXPLAIN.

806
00:43:02,240 --> 00:43:04,703
It's just a lot of stuff, yeah, yeah.

807
00:43:04,703 --> 00:43:08,040
But a lot of stuff is helping all the time, it's true.

808
00:43:08,300 --> 00:43:12,420
Do you want to discuss overhead from buffers option?

809
00:43:12,740 --> 00:43:14,340
Michael: I couldn't.

810
00:43:14,340 --> 00:43:17,880
When I did some really basic, really basic experiment where

811
00:43:17,880 --> 00:43:22,500
I tried to, where I did very, very small queries, but I couldn't

812
00:43:22,500 --> 00:43:23,940
get any additional overhead.

813
00:43:23,940 --> 00:43:27,180
I couldn't, there wasn't like a measurable extra overhead of

814
00:43:27,180 --> 00:43:28,240
requesting buffers.

815
00:43:28,580 --> 00:43:31,160
I think you said you did an experiment once that did show it,

816
00:43:31,160 --> 00:43:32,540
but I don't remember.

817
00:43:32,780 --> 00:43:33,280
Nikolay: Yeah.

818
00:43:33,340 --> 00:43:33,900
On timing.

819
00:43:33,900 --> 00:43:36,560
It cannot, like buffers are like, they are basically invariant.

820
00:43:36,700 --> 00:43:39,720
Of course, there can be hits or reads or they're like moving

821
00:43:39,720 --> 00:43:40,440
back and forth.

822
00:43:40,440 --> 00:43:43,220
But if everything is cached, it's very stable.

823
00:43:44,160 --> 00:43:48,000
Like you're reading the same data, like same query, just SELECT

824
00:43:48,260 --> 00:43:51,460
multiple times, you will see the same number of buffers all the

825
00:43:51,460 --> 00:43:51,900
time.

826
00:43:51,900 --> 00:43:56,640
That's one of the reasons I like them.

827
00:43:56,960 --> 00:44:00,800
But if you include buffers, I did see in the past, it was some

828
00:44:00,800 --> 00:44:03,220
time ago, that timing is increasing.

829
00:44:04,640 --> 00:44:10,020
Because a lot of buffers need to be reported and counters inside

830
00:44:11,480 --> 00:44:12,480
need to be presented.

831
00:44:12,860 --> 00:44:17,800
If it's a complex query, in many places this overhead adds up

832
00:44:17,800 --> 00:44:22,640
and the timing without buffers option is very different compared

833
00:44:22,640 --> 00:44:23,800
to with buffers.

834
00:44:25,840 --> 00:44:29,160
Michael: I've seen single queries where that's true and I've

835
00:44:29,160 --> 00:44:32,720
seen I think Lukas Fittl from pganalyze has done some examples

836
00:44:32,720 --> 00:44:33,280
of this.

837
00:44:33,280 --> 00:44:37,600
I think I've only seen it in queries that have nested loops with

838
00:44:37,600 --> 00:44:38,580
lots of loops.

839
00:44:38,940 --> 00:44:42,620
That's the example everybody turns to.

840
00:44:42,620 --> 00:44:43,280
For example,

841
00:44:43,940 --> 00:44:45,340
Nikolay: join with nested loop.

842
00:44:46,080 --> 00:44:51,880
Michael: But the reason I think
it's slightly overhyped is when

843
00:44:51,880 --> 00:44:54,420
you're optimising, you don't always
care.

844
00:44:54,520 --> 00:44:59,120
If the EXPLAIN analyzes 4 seconds,
or maybe let's pick 400

845
00:44:59,220 --> 00:45:03,300
milliseconds, and your real execution
time is 300 milliseconds,

846
00:45:04,540 --> 00:45:08,420
if you can speed the 400 milliseconds
up to 50 milliseconds,

847
00:45:09,020 --> 00:45:12,860
the 300 might go down to 40, and
directionally you're still looking

848
00:45:12,860 --> 00:45:13,820
for the same things.

849
00:45:13,820 --> 00:45:17,120
And by asking for buffers information,
you're getting more information,

850
00:45:17,160 --> 00:45:19,680
you're more likely to be able to
optimize your query.

851
00:45:19,680 --> 00:45:23,740
So I don't care as much as most
that it's slightly different.

852
00:45:23,740 --> 00:45:27,600
Nikolay: Maybe I only care that including
buffers don't change

853
00:45:27,740 --> 00:45:28,240
buffers.

854
00:45:28,380 --> 00:45:29,240
That's it.

855
00:45:29,760 --> 00:45:30,280
That's it.

856
00:45:30,280 --> 00:45:34,840
So For me, when I start working
with buffers, I'm less concerned

857
00:45:34,840 --> 00:45:35,580
about timing.

858
00:45:36,220 --> 00:45:39,560
Focus on buffers, optimize, optimize,
optimize, and then go back

859
00:45:39,560 --> 00:45:42,900
to result and see, oh, timing has
improved drastically because

860
00:45:42,900 --> 00:45:45,740
we reduced number of buffer hits
and reads.

861
00:45:46,400 --> 00:45:46,960
That's it.

862
00:45:46,960 --> 00:45:50,000
This is a very, very common path
for me.

863
00:45:50,720 --> 00:45:54,140
Just reduce because you like sequential
scan, lots of buffer

864
00:45:54,140 --> 00:45:55,060
hits and reads.

865
00:45:55,680 --> 00:45:57,620
We are at an index, okay,

866
00:45:57,980 --> 00:46:01,660
Michael: couple of, 3, 4, 5 buffer
hits, that's it.

867
00:46:02,020 --> 00:46:03,240
Time to look at timing.

868
00:46:03,240 --> 00:46:07,940
Oh, of course, expect it a thousand
times faster, right?

869
00:46:09,320 --> 00:46:12,840
I like, I like, you know, I like
buffers a lot, but for me they'll

870
00:46:12,840 --> 00:46:17,360
always be secondary because just
Because if you look at timings

871
00:46:17,560 --> 00:46:20,920
first and last, that's the main
aim.

872
00:46:20,920 --> 00:46:22,860
The main aim is to get things faster.

873
00:46:22,860 --> 00:46:28,340
And yes, in maybe 99% of cases,
it's going to be I/O.

874
00:46:28,440 --> 00:46:30,040
That's the limiting factor.

875
00:46:30,680 --> 00:46:34,680
But in some queries, there'll be
a sort in memory.

876
00:46:35,580 --> 00:46:40,300
There'll be something that doesn't
report extra buffers in EXPLAIN.

877
00:46:41,040 --> 00:46:45,400
There'll be just-in-time compilation,
or there'll be some limit,

878
00:46:45,400 --> 00:46:46,740
like maybe trigger timing.

879
00:46:47,020 --> 00:46:48,780
Nikolay: Just-in-time compilation
is off.

880
00:46:49,540 --> 00:46:52,600
Michael: But all I'm saying or
planning time, like actually planning

881
00:46:52,600 --> 00:46:56,760
time, you now do get buffers, but
trigger times, just in time

882
00:46:56,760 --> 00:47:00,320
compilation, there's like a few
other examples, sorts in memory,

883
00:47:00,320 --> 00:47:05,220
hashes in memory, like if some
of these other bottlenecks don't

884
00:47:05,220 --> 00:47:08,180
report buffers, I would just, why
not look at timings and then

885
00:47:08,180 --> 00:47:08,680
buffers?

886
00:47:09,440 --> 00:47:13,680
Also, I almost always see query
plans with both these days.

887
00:47:13,900 --> 00:47:16,240
So it's kind of a

888
00:47:16,460 --> 00:47:16,960
Nikolay: care.

889
00:47:17,180 --> 00:47:19,980
That's great that buffers are present.

890
00:47:21,340 --> 00:47:24,400
At some point I started saying
if there are no buffers I'm not

891
00:47:24,400 --> 00:47:25,540
dealing with this query.

892
00:47:26,200 --> 00:47:30,300
This is what I say to customers,
go and give me with buffers

893
00:47:30,300 --> 00:47:33,340
because we are missing super important.

894
00:47:33,340 --> 00:47:37,540
Okay, if you put it in second place,
this place should be always

895
00:47:37,540 --> 00:47:38,040
present.

896
00:47:38,480 --> 00:47:38,980
Yeah.

897
00:47:39,640 --> 00:47:40,080
I'm sure

898
00:47:40,080 --> 00:47:41,160
Michael: that's what it's like
before.

899
00:47:42,040 --> 00:47:43,260
I think that's probably enough.

900
00:47:43,260 --> 00:47:47,640
I've got a talk that I gave a few
years ago called explain beyond

901
00:47:47,640 --> 00:47:50,280
the basics which I would have thought
we had covered some of

902
00:47:50,280 --> 00:47:53,240
the bits of but actually I took
a completely different path that

903
00:47:53,240 --> 00:47:57,380
I went down some of the less common
types of issue that you can

904
00:47:57,380 --> 00:48:00,040
spot with explain which we haven't
really talked about so I can

905
00:48:00,040 --> 00:48:03,120
include that as like a if anybody
wants to know a bit more of

906
00:48:03,120 --> 00:48:06,880
the specifics, it's a bit nicer,
like that format is good because

907
00:48:06,880 --> 00:48:10,160
you can actually see examples of
EXPLAIN plans, so it's probably

908
00:48:10,160 --> 00:48:11,740
a bit better video wise.

909
00:48:12,380 --> 00:48:16,360
And yet the reason I see buffers
more often now is everybody

910
00:48:16,400 --> 00:48:19,900
suggests adding it, You, but also
some of the other tools as

911
00:48:19,900 --> 00:48:23,760
well, like depesz added it to the
default thing that he suggests

912
00:48:23,760 --> 00:48:24,520
people get.

913
00:48:24,520 --> 00:48:26,140
So there's loads of people.

914
00:48:26,320 --> 00:48:27,240
Nikolay: That's news to me.

915
00:48:27,240 --> 00:48:31,660
I remember it was missing and it
was a big disadvantage.

916
00:48:32,540 --> 00:48:33,240
Michael: He added it.

917
00:48:33,240 --> 00:48:33,960
It's great.

918
00:48:34,940 --> 00:48:35,360
Nikolay: That's good.

919
00:48:35,360 --> 00:48:35,860
Yeah.

920
00:48:36,760 --> 00:48:38,180
Michael: Anything else you wanted to add?

921
00:48:38,620 --> 00:48:40,020
Nikolay: I think it's enough.

922
00:48:40,680 --> 00:48:41,400
Michael: Me too.

923
00:48:43,180 --> 00:48:43,520
Nikolay: Thank you.

924
00:48:43,520 --> 00:48:44,160
See you later.

925
00:48:44,160 --> 00:48:44,610
Bye.