1
00:00:00,100 --> 00:00:03,180
Hello and welcome to PostgresFM,
a weekly show about

2
00:00:03,180 --> 00:00:04,080
all things PostgreSQL.

3
00:00:04,280 --> 00:00:07,700
I am Michael, founder of pgMustard
and today I am not joined

4
00:00:07,700 --> 00:00:08,500
by Nikolay.

5
00:00:09,000 --> 00:00:12,300
He is finally taking a well-earned
break.

6
00:00:12,440 --> 00:00:14,160
So this week you've just got me.

7
00:00:14,280 --> 00:00:17,860
I am in the process of getting
a couple of guests as well, which

8
00:00:17,860 --> 00:00:18,960
I'm excited about.

9
00:00:18,960 --> 00:00:22,660
So hopefully for the next couple
of weeks we'll be able to give

10
00:00:22,660 --> 00:00:23,740
you those conversations.

11
00:00:24,200 --> 00:00:27,540
But in the meantime I'm going to
record an episode on pretty

12
00:00:27,540 --> 00:00:31,220
much the only topic I feel comfortable
sharing knowledge on my

13
00:00:31,220 --> 00:00:33,420
own, which is EXPLAIN.

14
00:00:34,300 --> 00:00:37,040
And due to the nature of the product
I've been working on for

15
00:00:37,040 --> 00:00:41,120
the past few years, I have spent
a lot of time looking at EXPLAIN

16
00:00:41,120 --> 00:00:42,880
plans and helping folks with those.

17
00:00:42,880 --> 00:00:46,320
So yeah, I've given a couple of
talks on this in the past, but

18
00:00:46,320 --> 00:00:49,340
I realized we hadn't actually done
a podcast episode on it.

19
00:00:49,340 --> 00:00:53,360
We've done a few related ones,
so long-term listeners will be

20
00:00:53,360 --> 00:00:56,760
familiar with some of the things
we've talked about previously,

21
00:00:56,760 --> 00:01:02,040
but we haven't done a very, an
intro level one or even I have a

22
00:01:02,040 --> 00:01:04,300
process I'd like to recommend people
use.

23
00:01:04,300 --> 00:01:06,720
I haven't discussed that yet, so
I'm hoping to give you a couple

24
00:01:06,720 --> 00:01:07,620
of those things.

25
00:01:07,900 --> 00:01:11,820
But on the topic of guests, before
I move on, if you have any

26
00:01:11,820 --> 00:01:15,360
recommendations or requests on
either the guest front or the

27
00:01:15,360 --> 00:01:17,360
topic front, those are always welcome.

28
00:01:17,360 --> 00:01:20,520
Please send those to us either
privately or publicly.

29
00:01:21,020 --> 00:01:22,500
They'd be very, very welcome.

30
00:01:23,240 --> 00:01:27,480
So EXPLAIN first, what is EXPLAIN
and what does it do?

31
00:01:27,880 --> 00:01:32,380
So EXPLAIN is a statement that
you put in front of your query

32
00:01:32,700 --> 00:01:36,100
and PostgreSQL will return the query
execution plan.

33
00:01:36,280 --> 00:01:38,180
So by default you just get the
plan.

34
00:01:38,320 --> 00:01:42,340
So what PostgreSQL would do, what
choices it would make to execute

35
00:01:42,380 --> 00:01:43,100
that query.

36
00:01:43,180 --> 00:01:47,920
It doesn't execute the query if
you just use EXPLAIN and it provides

37
00:01:48,040 --> 00:01:51,100
back the estimated costs involved.

38
00:01:51,340 --> 00:01:54,440
So you might have heard that PostgreSQL
has a cost-based optimizer.

39
00:01:55,080 --> 00:02:00,760
This gives you some insight into
the costs that it's calculating

40
00:02:00,760 --> 00:02:02,720
and why it's making certain choices.

41
00:02:02,720 --> 00:02:06,680
So why it's choosing a certain
scan type over another scan

42
00:02:06,680 --> 00:02:14,120
type or the join order it's
chosen or the join type it's

43
00:02:14,120 --> 00:02:15,700
doing for a specific join.

44
00:02:15,960 --> 00:02:20,500
So all of those are based on what's
possible and also what it

45
00:02:20,500 --> 00:02:21,680
thinks will be fastest.

46
00:02:22,360 --> 00:02:25,900
Now, the costs are in an arbitrary
unit, but the idea is the

47
00:02:25,900 --> 00:02:29,740
lower the cost, the faster the
optimizer thinks it would be.

48
00:02:30,020 --> 00:02:34,300
So, it is correlated to time, but
they're not in a unit of time.

49
00:02:34,740 --> 00:02:39,300
So, EXPLAIN on its own can be useful
to quickly see, for example,

50
00:02:39,400 --> 00:02:44,720
if you have a query that's timing
out or taking absolutely ages,

51
00:02:45,040 --> 00:02:49,620
you can quickly have a look at
the query plan and just check

52
00:02:49,840 --> 00:02:52,000
maybe why it might be slow.

53
00:02:52,300 --> 00:02:56,820
But if we want to know for sure
why a query is slow and it's

54
00:02:56,820 --> 00:03:00,260
executing in a reasonable amount
of time, chances are we're going

55
00:03:00,260 --> 00:03:04,540
to want to use some EXPLAIN parameters
to not just get the plan

56
00:03:04,540 --> 00:03:09,820
back but also run the query behind
the scenes and get execution

57
00:03:09,920 --> 00:03:10,420
statistics.

58
00:03:11,140 --> 00:03:15,240
So the most common of those parameters
that you might have heard

59
00:03:15,240 --> 00:03:16,820
alongside EXPLAIN is ANALYZE.

60
00:03:17,240 --> 00:03:21,060
So and this is different from the
ANALYZE command for getting query

61
00:03:21,060 --> 00:03:21,560
statistics.

62
00:03:21,600 --> 00:03:24,960
This is EXPLAIN ANALYZE specifically,
but there are loads more

63
00:03:24,960 --> 00:03:25,920
parameters as well.

64
00:03:25,920 --> 00:03:29,600
In fact, we did a whole episode
on why we believe, or Nikolay

65
00:03:29,600 --> 00:03:32,880
and I believe, that BUFFERS should
be turned on by default with ANALYZE.

66
00:03:32,880 --> 00:03:37,160
So BUFFERS is another parameter
that gives you extra runtime

67
00:03:37,460 --> 00:03:41,520
or execution statistics alongside
the timings that you get with

68
00:03:41,520 --> 00:03:42,020
ANALYZE.

69
00:03:42,080 --> 00:03:45,860
So with BUFFERS, we also get information
about the blocks read,

70
00:03:45,860 --> 00:03:46,360
written.

71
00:03:46,560 --> 00:03:50,700
So the amount of work being done
by the query, not just the timings.

72
00:03:51,220 --> 00:03:52,360
So that can really help.

73
00:03:52,360 --> 00:03:54,560
But there are a bunch of other
parameters as well.

74
00:03:54,660 --> 00:03:58,760
I like to recommend people use
as many of them as they can.

75
00:03:58,860 --> 00:04:01,160
Most of them don't have that much
additional overhead.

76
00:04:01,640 --> 00:04:02,780
ANALYZE can.

77
00:04:02,780 --> 00:04:06,100
In fact, that's something worth
bearing in mind, that there are

78
00:04:06,100 --> 00:04:09,780
a couple of caveats when it comes
to EXPLAIN ANALYZE.

79
00:04:09,800 --> 00:04:13,940
It can both overreport and underreport the amount of time that

80
00:04:13,940 --> 00:04:15,780
a query would take in the wild.

81
00:04:16,300 --> 00:04:20,760
So for example, it's been referred
to as the observer effect

82
00:04:20,760 --> 00:04:28,680
before, but if we have a slow system
clock, the overhead of checking

83
00:04:28,680 --> 00:04:34,800
the timings can add significant
overhead to the total execution

84
00:04:34,800 --> 00:04:38,880
time when we're running EXPLAIN
with ANALYZE and specifically

85
00:04:38,880 --> 00:04:40,020
with timings on.

86
00:04:40,760 --> 00:04:43,820
So, that's how it can overreport.

87
00:04:44,060 --> 00:04:48,660
That's the easiest way it can overreport
on the execution time.

88
00:04:48,840 --> 00:04:53,420
But it can also underreport if
for example, when we're

89
00:04:53,420 --> 00:04:56,640
running EXPLAIN ANALYZE, Postgres
doesn't have to transmit the data

90
00:04:56,640 --> 00:04:57,300
to us.

91
00:04:57,540 --> 00:05:03,540
So if we're requesting thousands
of rows, and we're not close

92
00:05:03,540 --> 00:05:07,080
to the database, so perhaps it's
a user doing a big report from

93
00:05:07,080 --> 00:05:10,680
a different continent than the
database is on, your monitoring

94
00:05:10,680 --> 00:05:14,840
might be showing that that query
is taking a few seconds to execute,

95
00:05:15,060 --> 00:05:18,000
whereas maybe EXPLAIN ANALYZE would
report it only taking a few

96
00:05:18,000 --> 00:05:21,960
hundred milliseconds because it
doesn't have to transmit that

97
00:05:21,960 --> 00:05:23,160
data across the network.

98
00:05:23,440 --> 00:05:27,580
So it can overreport and underreport,
but generally it's generally

99
00:05:27,920 --> 00:05:32,940
that's pretty uncommon and also
directionally you'll still see

100
00:05:33,160 --> 00:05:37,220
where the slowest parts of that
query plan are, even if the numbers

101
00:05:37,220 --> 00:05:38,160
aren't precise.

102
00:05:39,080 --> 00:05:40,620
So yeah, so those are the caveats.

103
00:05:41,500 --> 00:05:46,640
In terms of the process I recommend,
the number 1, I think I've

104
00:05:46,640 --> 00:05:49,400
mentioned it before in the podcast,
but it's so important, it's

105
00:05:49,400 --> 00:05:50,140
worth repeating.

106
00:05:51,020 --> 00:05:57,600
You need to be running on a realistic
size dataset to get realistic

107
00:05:57,680 --> 00:05:58,680
performance data.

108
00:05:59,540 --> 00:06:02,980
Now, it doesn't have to be the
exact same data.

109
00:06:03,580 --> 00:06:05,700
The important part is the number
of rows.

110
00:06:06,420 --> 00:06:11,400
So for example, if you were testing
out a new feature and your

111
00:06:11,580 --> 00:06:16,820
local dataset only has a couple
of dozen rows, but production

112
00:06:16,860 --> 00:06:21,380
has a few million, chances are
the Postgres plan is going to

113
00:06:21,380 --> 00:06:26,820
be making different choices on
join order or join algorithm,

114
00:06:27,240 --> 00:06:28,380
even scan type.

115
00:06:28,380 --> 00:06:30,920
So it could even choose to do a
sequential scan if your data

116
00:06:30,920 --> 00:06:33,740
is small enough, even if there
is a good index available.

117
00:06:34,200 --> 00:06:38,000
So that can really surprise people,
but it's really important.

118
00:06:38,720 --> 00:06:43,080
As the datasets get bigger, this
becomes slightly less important.

119
00:06:43,080 --> 00:06:46,280
So if you're testing on tens of
millions of rows instead of hundreds

120
00:06:46,280 --> 00:06:49,540
of millions, there's less of a
chance of an issue but still a

121
00:06:49,540 --> 00:06:51,900
chance of an issue getting a different
plan.

122
00:06:52,480 --> 00:06:55,560
So very much worth bearing that
in mind.

123
00:06:55,560 --> 00:06:59,080
If you can test on a realistic
size dataset, please do.

124
00:06:59,240 --> 00:07:02,120
There's lots of tools available
for doing that and for masking

125
00:07:02,120 --> 00:07:03,300
data these days.

126
00:07:03,960 --> 00:07:07,120
But that's, in fact, I think if
we've done an episode on that,

127
00:07:07,120 --> 00:07:08,400
I think we have on benchmarking.

128
00:07:08,540 --> 00:07:10,220
So I'll link that one up.

129
00:07:10,520 --> 00:07:13,380
So that's tip number 1, or step
1 in the process.

130
00:07:13,380 --> 00:07:15,820
Make sure you are, or the person
that you're helping, is testing

131
00:07:15,820 --> 00:07:17,800
on a realistic dataset.

132
00:07:19,600 --> 00:07:21,180
Use as many parameters as possible.

133
00:07:21,180 --> 00:07:22,620
I've hinted at that already.

134
00:07:22,680 --> 00:07:25,740
Mentioned ANALYZE and BUFFERS,
but there are also parameters

135
00:07:26,100 --> 00:07:29,680
for settings, which tells you any
non-default planner-related

136
00:07:30,260 --> 00:07:31,360
settings that you have.

137
00:07:31,360 --> 00:07:36,220
So for example, if you've changed
work mem or random page cost,

138
00:07:36,220 --> 00:07:39,180
these are common ones to have changed,
those will show up.

139
00:07:39,480 --> 00:07:44,180
But also if you, for example, been
testing a few locally and

140
00:07:44,180 --> 00:07:46,640
you've forgotten that you've still
got those set, they'll show

141
00:07:46,640 --> 00:07:49,220
up in your EXPLAIN plan as well
which can be helpful.

142
00:07:50,460 --> 00:07:53,160
Another important one is, well, another
one I find really useful

143
00:07:53,160 --> 00:07:56,320
is verbose, especially when helping
other people, you get to

144
00:07:56,320 --> 00:07:59,180
see information about the schema,
for example, you get fully

145
00:07:59,180 --> 00:08:04,460
qualified object names, you also
get the output of each operation

146
00:08:04,540 --> 00:08:09,180
in the query execution plan, what
data it's sending to its parent

147
00:08:09,400 --> 00:08:13,580
operation, which can be useful
for advising certain optimizations.

148
00:08:15,060 --> 00:08:18,500
A couple of others, a new one is
the write-ahead logging information,

149
00:08:18,740 --> 00:08:22,000
or I say new, I think it's a few
versions old now.

150
00:08:22,200 --> 00:08:24,780
You can also specify different
formats.

151
00:08:25,120 --> 00:08:28,580
So JSON format is particularly
popular these days amongst tools,

152
00:08:29,040 --> 00:08:34,000
but text format is generally still
the most concise and most

153
00:08:34,000 --> 00:08:36,560
readable amongst especially people
that are most familiar with

154
00:08:36,560 --> 00:08:36,900
it.

155
00:08:36,900 --> 00:08:40,320
They'll often ask to see the text
format plan because it's smaller

156
00:08:40,320 --> 00:08:42,540
and more familiar to them.

157
00:08:43,280 --> 00:08:46,260
Now there are a couple of other
parameters as well.

158
00:08:47,220 --> 00:08:50,580
Less important because people generally
leave them on or leave

159
00:08:50,580 --> 00:08:53,680
the defaults in place but timing
can be really useful if you

160
00:08:53,680 --> 00:08:57,660
want to, for example, turn timing
information off but have ANALYZE

161
00:08:57,800 --> 00:08:58,300
on.

162
00:08:58,520 --> 00:09:04,640
So you still get the full execution
timing statistics but

163
00:09:04,640 --> 00:09:08,820
you don't measure the per node
timing, which is where the main

164
00:09:08,820 --> 00:09:13,500
overhead is so I've seen some people
with some tool vendors recommend

165
00:09:13,500 --> 00:09:17,320
turning that off with auto_explain
but worth measuring if it

166
00:09:17,320 --> 00:09:21,300
does actually have any overhead
on your system before doing so

167
00:09:21,880 --> 00:09:24,240
because timing information could
be really useful when looking

168
00:09:24,240 --> 00:09:25,820
at speeding things up.

169
00:09:26,440 --> 00:09:28,000
Costs you can turn off as well.

170
00:09:28,000 --> 00:09:30,360
The only time I've seen that done
and the only time I've done

171
00:09:30,360 --> 00:09:34,740
it myself is when writing blog
posts or sharing query plans where

172
00:09:34,740 --> 00:09:39,220
I just want to make them slightly
less intimidating or draw attention

173
00:09:39,220 --> 00:09:42,700
to the more important parts. So
those are on by default and the

174
00:09:42,700 --> 00:09:45,760
other one that's on by default is
or at least when you use EXPLAIN

175
00:09:45,760 --> 00:09:50,140
(ANALYZE) is summary, which puts
the information at the end.

176
00:09:50,140 --> 00:09:52,800
So like planning timings and execution
times, which are very

177
00:09:52,800 --> 00:09:53,300
useful.

178
00:09:53,480 --> 00:09:56,240
So generally people don't turn
that off, but it means you can

179
00:09:56,240 --> 00:09:59,100
turn that on, for example, with
EXPLAIN, which gives you the

180
00:09:59,100 --> 00:10:00,040
planning time.

181
00:10:00,180 --> 00:10:03,620
Because remember when we just do
EXPLAIN, we only plan the query,

182
00:10:03,680 --> 00:10:06,260
we don't, Postgres doesn't execute
that.

183
00:10:06,260 --> 00:10:09,720
So you can get the planning time
just from EXPLAIN alone if you

184
00:10:09,720 --> 00:10:11,640
use the summary parameter.

185
00:10:11,940 --> 00:10:15,360
But generally I recommend putting
as many of those on as you

186
00:10:15,420 --> 00:10:19,840
can, especially if you're using
a tool to parse it or if you

187
00:10:19,840 --> 00:10:22,200
are getting help from others, the
more information the better

188
00:10:22,200 --> 00:10:22,940
for them.

189
00:10:23,800 --> 00:10:25,960
Cool, so that's the second step.

190
00:10:25,960 --> 00:10:29,940
The third one is glance at the end.

191
00:10:30,060 --> 00:10:33,500
So query plans, especially once
your queries get a bit more complex,

192
00:10:33,820 --> 00:10:34,960
can get quite long.

193
00:10:34,960 --> 00:10:38,140
You can get, feel a bit like a
wall of text.

194
00:10:39,020 --> 00:10:43,440
But a few of the probably rarer
problems, to be honest, but a

195
00:10:43,440 --> 00:10:49,540
few of the fairly common problems
are highlighted very much at

196
00:10:49,540 --> 00:10:50,220
the end.

197
00:10:50,220 --> 00:10:57,420
So things like, if your planning
time is 10X your execution time,

198
00:10:57,800 --> 00:11:00,400
there's very little point looking
through the rest of the query

199
00:11:00,400 --> 00:11:06,880
plan, looking for optimization
potential, because the vast majority

200
00:11:08,800 --> 00:11:11,140
of the total time was in planning.

201
00:11:11,500 --> 00:11:14,540
So that's only reported at the
end and the rest of the query

202
00:11:14,540 --> 00:11:15,800
plan won't help you.

203
00:11:17,100 --> 00:11:20,140
The other two things that are reported
at the end that also report

204
00:11:20,140 --> 00:11:22,100
a total time are triggers.

205
00:11:22,340 --> 00:11:27,540
So trigger executions, they can
be the performance issue.

206
00:11:27,540 --> 00:11:30,020
If you have triggers involved,
there's going to be some overhead

207
00:11:30,020 --> 00:11:30,300
there.

208
00:11:30,300 --> 00:11:31,820
They report a total time.

209
00:11:31,960 --> 00:11:36,340
If that's a high percentage of
your total execution time, you

210
00:11:36,340 --> 00:11:39,160
found your problem already without
looking through the whole

211
00:11:39,160 --> 00:11:40,100
execution plan.

212
00:11:40,120 --> 00:11:42,040
Same is true for just-in-time compilation.

213
00:11:42,740 --> 00:11:49,160
That can in certain cases be 99%
of execution time, which again,

214
00:11:49,900 --> 00:11:52,940
makes looking through the query
plan not as useful.

215
00:11:54,340 --> 00:11:54,840
Cool.

216
00:11:54,940 --> 00:11:57,420
So that's the third one.

217
00:11:57,620 --> 00:12:02,180
Fourth is then once you've looked
at those, then figure out,

218
00:12:02,560 --> 00:12:05,620
okay, we do need to look at the
rest of the query plan.

219
00:12:06,600 --> 00:12:08,420
Where is the time going?

220
00:12:08,420 --> 00:12:10,360
Which parts of these are the most
expensive?

221
00:12:10,380 --> 00:12:13,580
Or if you have BUFFERS on, which
of these scans is doing the

222
00:12:13,580 --> 00:12:15,040
most reads?

223
00:12:16,020 --> 00:12:19,040
The reason I mention that that's
like step 4 is because I think

224
00:12:19,040 --> 00:12:22,640
it's very easy at this point to
jump to other, like jump to looking

225
00:12:22,640 --> 00:12:25,760
to other issues and I've been guilty
of this in the past myself

226
00:12:25,760 --> 00:12:26,460
as well.

227
00:12:26,680 --> 00:12:31,240
So experienced people, I think
maybe even more susceptible to

228
00:12:31,240 --> 00:12:34,940
this than less experienced people,
if you notice a sequential

229
00:12:35,020 --> 00:12:38,680
scan, your eye gets drawn to it
immediately and you think, oh,

230
00:12:38,680 --> 00:12:43,240
maybe there's an issue there, or
you notice a large filter, or

231
00:12:43,380 --> 00:12:47,140
a few thousand rows being filtered,
or a lot of loops or something.

232
00:12:47,320 --> 00:12:50,080
If your eye gets drawn to these
problems that you're used to

233
00:12:50,080 --> 00:12:53,880
looking for, you can get tripped
up or get, you can end up going

234
00:12:53,880 --> 00:12:58,040
down a bit of a rabbit hole on
this.

235
00:12:58,040 --> 00:13:02,720
So my tip is to really check where
the timings go, like where

236
00:13:02,720 --> 00:13:05,740
which of these is really taking
up the most time and then focus

237
00:13:05,740 --> 00:13:06,500
your attention there.

238
00:13:06,500 --> 00:13:11,200
Only once you've worked that out
focus on that part of the query

239
00:13:11,200 --> 00:13:12,900
or that part of the query plan.

240
00:13:13,180 --> 00:13:15,040
Is there a bottleneck we can solve?

241
00:13:15,140 --> 00:13:18,760
And maybe we'll come back to the
other issue we spotted later,

242
00:13:18,760 --> 00:13:21,300
but also the query might be fast
enough by that point and we

243
00:13:21,300 --> 00:13:22,180
can move on.

244
00:13:23,100 --> 00:13:26,580
So, yeah, that's the fourth and
fifth.

245
00:13:26,580 --> 00:13:30,980
So work out where the what the
most expensive parts of the query

246
00:13:31,080 --> 00:13:35,100
plan are and then and only then
what you can do about it.

247
00:13:35,500 --> 00:13:40,580
So on the timings front, there
are a few things to be aware of.

248
00:13:40,920 --> 00:13:48,720
So the execution time is reported
inclusive of so 1 operation's

249
00:13:48,820 --> 00:13:51,840
execution time is inclusive of
all of its children.

250
00:13:52,240 --> 00:13:55,920
So a little bit of subtraction
you'd think might be enough there.

251
00:13:55,920 --> 00:14:00,040
to work out maybe which of the
operations is most expensive on

252
00:14:00,040 --> 00:14:00,740
its own.

253
00:14:01,560 --> 00:14:05,080
But you have to take into account
a bunch of other things as

254
00:14:05,080 --> 00:14:05,460
well.

255
00:14:05,460 --> 00:14:09,820
So, those execution times are a
per loop average.

256
00:14:10,320 --> 00:14:14,440
So, for example, if one of our operations
is being looped over

257
00:14:15,240 --> 00:14:19,700
multiple times, which is especially
common in join operations.

258
00:14:20,600 --> 00:14:28,100
But also, parallel execution utilizes
loops as how it reports.

259
00:14:28,100 --> 00:14:32,360
So, there's a few ways where we
have to then multiply it by the

260
00:14:32,360 --> 00:14:35,580
number of loops before subtracting
from the parent operation.

261
00:14:35,580 --> 00:14:38,220
So these things, you can see how
they can start to get a little

262
00:14:38,220 --> 00:14:38,860
bit complex.

263
00:14:39,220 --> 00:14:43,580
In addition, Postgres reports these
timings in milliseconds and

264
00:14:43,580 --> 00:14:45,820
doesn't include thousands separators.

265
00:14:46,280 --> 00:14:49,740
So I know I personally can start
to struggle with the arithmetic

266
00:14:49,740 --> 00:14:52,700
of all that once it starts getting
into the multiple seconds.

267
00:14:53,240 --> 00:14:56,000
As such, there's a ton of tools
that can help you with this.

268
00:14:56,000 --> 00:14:59,440
I work on one, but there are some
really good free open source

269
00:14:59,440 --> 00:15:00,140
ones available.

270
00:15:00,240 --> 00:15:03,480
Lots of editors include them by
default and pretty much all of

271
00:15:03,480 --> 00:15:07,480
them make at least some effort
to get those timings pretty good.

272
00:15:07,480 --> 00:15:12,340
So some popular ones are Depesz
EXPLAIN, a really nice

273
00:15:12,340 --> 00:15:15,040
thing about that is that it preserves
the text format.

274
00:15:15,040 --> 00:15:19,360
So it's really expert friendly
if you want to share an EXPLAIN

275
00:15:19,360 --> 00:15:23,200
plan with the Postgres mailing
lists, the one they'll be most familiar

276
00:15:23,200 --> 00:15:26,560
with and most comfortable with
will be the Depesz EXPLAIN one.

277
00:15:26,880 --> 00:15:30,320
But EXPLAIN Dalibo is really
popular these days, much more

278
00:15:30,320 --> 00:15:34,920
visual, includes more charts on,
for example, where the timings

279
00:15:34,920 --> 00:15:37,660
are going, where the BUFFERS, like
which operations do the most

280
00:15:37,660 --> 00:15:39,300
buffer hits, that kind of thing.

281
00:15:40,160 --> 00:15:44,600
Much more beginner friendly, and
as I would probably class my

282
00:15:44,600 --> 00:15:48,700
tool as well, pgMustard is visual as
well, but also helps with that

283
00:15:48,700 --> 00:15:52,160
next step a bit more on what you
can possibly do about things.

284
00:15:52,480 --> 00:15:55,200
But yeah, so the main reason I want to mention those

285
00:15:55,200 --> 00:15:57,180
is they can really help on the
timings front.

286
00:15:57,180 --> 00:16:00,800
So you can paste your query plan
in and quickly see where the

287
00:16:00,800 --> 00:16:01,860
timings are going.

288
00:16:01,960 --> 00:16:05,940
I would say that a lot of them
don't do that step 3.

289
00:16:05,940 --> 00:16:10,300
So a lot of them don't immediately
draw your attention if there's

290
00:16:10,300 --> 00:16:13,640
triggers taking up a lot of time
or just in time compilation

291
00:16:13,660 --> 00:16:16,140
taking up a lot of time or even
planning time.

292
00:16:16,280 --> 00:16:20,860
So it is still worth doing that
step first and then checking

293
00:16:20,860 --> 00:16:24,640
out a tool or two that you're familiar
with for the next steps.

294
00:16:25,440 --> 00:16:27,780
Yeah, that's pretty much it for
the process.

295
00:16:29,200 --> 00:16:33,180
Obviously, that fifth step, working
out what to do next, is a

296
00:16:33,180 --> 00:16:34,540
hugely deep topic.

297
00:16:34,540 --> 00:16:38,260
And it's actually where a lot of
the normal talks on EXPLAIN,

298
00:16:38,260 --> 00:16:40,760
if you go to a conference and see
a talk on EXPLAIN, that's normally

299
00:16:40,760 --> 00:16:41,980
where they'll pick up.

300
00:16:42,440 --> 00:16:46,000
So the best one I've seen is a bit
old now.

301
00:16:46,000 --> 00:16:47,580
It's from probably about 2016.

302
00:16:48,480 --> 00:16:50,940
But Josh Berkus gave it a few times.

303
00:16:50,940 --> 00:16:54,940
But I'll share one of the recordings
that I liked a lot.

304
00:16:55,340 --> 00:16:58,760
Yeah, he spent about an hour going
through a lot of the different

305
00:16:59,100 --> 00:17:03,540
join types, scan types, types of
issues you can see, things to

306
00:17:03,540 --> 00:17:04,660
be aware of.

307
00:17:04,780 --> 00:17:08,880
And yeah, it's a lot, even then,
probably only covers the first

308
00:17:09,000 --> 00:17:11,060
10% of what you could know.

309
00:17:11,100 --> 00:17:16,220
There are more recent talks, one
by Bruce Momjian actually, who'd

310
00:17:16,220 --> 00:17:18,240
covered some of the rarer scan
types.

311
00:17:18,240 --> 00:17:19,740
So I'll link that one up as well.

312
00:17:19,740 --> 00:17:23,000
But this is a deep, deep topic
that you can dive into.

313
00:17:23,000 --> 00:17:26,340
I'm a few years in and I don't
consider myself an expert yet.

314
00:17:26,520 --> 00:17:28,220
So that gives you an idea.

315
00:17:29,340 --> 00:17:33,060
But yeah, it's an extremely powerful
tool for query optimization

316
00:17:33,120 --> 00:17:33,820
in general.

317
00:17:34,600 --> 00:17:38,160
Naturally, to speed things up,
it really helps to know which

318
00:17:38,160 --> 00:17:39,060
part's slow.

319
00:17:39,400 --> 00:17:42,220
So please use it.

320
00:17:43,660 --> 00:17:44,760
Let me know how you get on.

321
00:17:44,760 --> 00:17:45,940
It'd be cool to hear.

322
00:17:47,480 --> 00:17:50,440
We've done a bunch of related episodes,
so a couple of those

323
00:17:50,440 --> 00:17:54,060
to check out are there's an extension
called auto_explain.

324
00:17:54,920 --> 00:18:01,160
Now, that will let you run your
queries as normal, let's say,

325
00:18:01,400 --> 00:18:07,880
maybe even on production, and the
users get the results of the

326
00:18:07,880 --> 00:18:09,780
query back, which we don't when
we run EXPLAIN.

327
00:18:09,780 --> 00:18:11,540
I don't know if I mentioned that
up top.

328
00:18:11,940 --> 00:18:16,400
But the people using your database
will continue to run their

329
00:18:16,400 --> 00:18:18,660
queries and get their results as
usual.

330
00:18:18,800 --> 00:18:22,960
But in the background, the execution
plans will be being logged,

331
00:18:22,960 --> 00:18:27,660
any beyond a certain timing threshold
that you set will get logged

332
00:18:27,660 --> 00:18:29,440
to your Postgres logs.

333
00:18:29,440 --> 00:18:31,100
So we've done a whole episode on
that.

334
00:18:31,100 --> 00:18:32,480
Can be extremely useful.

335
00:18:32,680 --> 00:18:33,840
And I'll link that up.

336
00:18:33,840 --> 00:18:35,880
We've also done one on row estimates.

337
00:18:36,040 --> 00:18:38,300
So that's another common issue.

338
00:18:38,480 --> 00:18:43,220
So with EXPLAIN, we get to see
how many rows Postgres expected

339
00:18:43,260 --> 00:18:45,040
to be returned at each stage.

340
00:18:45,560 --> 00:18:49,640
And when it's wrong, when it's
wrong by a lot especially, that

341
00:18:49,640 --> 00:18:54,960
can lead to all sorts of issues
with inefficient plans being

342
00:18:54,960 --> 00:18:55,460
chosen.

343
00:18:55,940 --> 00:18:58,160
So we did a whole episode on that.

344
00:18:58,660 --> 00:19:01,280
In fact, our first ever episode
was on slow queries and slow

345
00:19:01,280 --> 00:19:01,780
transactions.

346
00:19:01,880 --> 00:19:06,060
So that's worth checking out if
you haven't already.

347
00:19:06,580 --> 00:19:10,580
And we did a more advanced one on
query optimization, which again,

348
00:19:10,580 --> 00:19:12,980
we still didn't cover some of the
basics of explaining.

349
00:19:12,980 --> 00:19:15,080
So I'm glad to have done this one
now.

350
00:19:15,920 --> 00:19:16,920
Welcome any questions.

351
00:19:17,780 --> 00:19:24,010
And please do remember to send
your guest and topic suggestions

352
00:19:24,340 --> 00:19:24,840
through.

353
00:19:25,540 --> 00:19:29,700
Happy holidays, hope you all had
a good one and catch you next

354
00:19:29,700 --> 00:19:30,540
time, hopefully with a guest.

355
00:19:30,540 --> 00:19:30,570
Bye.

356
00:19:30,570 --> 00:19:31,160
Hopefully with a guest.