1
00:00:00,060 --> 00:00:02,800
Nikolay: Hello, hello, this is
PostgresFM episode, I don't remember

2
00:00:02,800 --> 00:00:07,080
the number, and today... I'm Nikolay
and this is Michael.

3
00:00:07,080 --> 00:00:07,860
Hi, Michael.

4
00:00:08,160 --> 00:00:09,020
Michael: Hello, Nikolay.

5
00:00:09,020 --> 00:00:09,880
I think 85.

6
00:00:10,460 --> 00:00:13,020
Nikolay: Well, honestly, it's
already at this point, it

7
00:00:13,020 --> 00:00:13,820
doesn't matter.

8
00:00:14,680 --> 00:00:16,220
Michael: When we get to 100, it
will matter.

9
00:00:16,220 --> 00:00:17,860
Nikolay: Yeah, remind me, wake
me up.

10
00:00:17,860 --> 00:00:19,060
Yeah, at 100.

11
00:00:19,340 --> 00:00:24,440
So today I'm in the passenger seat
and Michael is going to talk

12
00:00:24,440 --> 00:00:29,540
about, I will join of course, but
from the passenger seat, about

13
00:00:29,800 --> 00:00:33,740
why the planner behaves not as
expected, not choosing index I

14
00:00:33,740 --> 00:00:38,160
have prepared and how to troubleshoot
that, right?

15
00:00:39,180 --> 00:00:41,020
Michael: Yeah, yeah, this was my
suggestion.

16
00:00:41,320 --> 00:00:45,640
So it's something I see a bunch
and I got reminded again yesterday.

17
00:00:46,020 --> 00:00:49,280
I happened to be on LinkedIn
of all social networks.

18
00:00:49,280 --> 00:00:53,920
It's surprisingly one of the slightly
nicer social networks to

19
00:00:53,920 --> 00:00:56,980
be on these days, which I would
have been shocked by a few years

20
00:00:56,980 --> 00:00:57,480
ago.

21
00:00:57,980 --> 00:01:02,960
But I saw a post from Brent Ozar,
who I remember from my Microsoft

22
00:01:02,980 --> 00:01:06,900
SQL Server days, and has been recently
getting into Postgres

23
00:01:06,900 --> 00:01:07,840
things, which is cool.

24
00:01:07,840 --> 00:01:10,520
I think it would be a great addition
to the community.

25
00:01:11,600 --> 00:01:15,300
And he posted what I thought was
going to be a bug post, and

26
00:01:15,300 --> 00:01:19,080
he cheekily said, why isn't Postgres
using my functional index?

27
00:01:19,080 --> 00:01:21,600
And it turned out it was actually
a Stack Exchange question that

28
00:01:21,600 --> 00:01:26,980
he had posted, and he had included
links to an Aurora database,

29
00:01:27,740 --> 00:01:31,780
which, not Postgres, but Postgres
compatible, which is fun.

30
00:01:32,080 --> 00:01:35,780
And he was genuinely asking why
isn't it using my index?

31
00:01:36,040 --> 00:01:38,800
And I'd noticed it was only posted
like sometimes on LinkedIn,

32
00:01:38,800 --> 00:01:41,360
you see posts from like 3 weeks
ago, but it was only posted about

33
00:01:41,360 --> 00:01:42,800
20, 25 minutes ago or something.

34
00:01:42,800 --> 00:01:45,140
I thought, 'Oh, you know, there's
no answers to this.

35
00:01:45,140 --> 00:01:47,780
I, this is something I generally
know about.

36
00:01:47,780 --> 00:01:49,780
I've written a whole blog post
on.'

37
00:01:49,960 --> 00:01:51,500
So maybe I can help here.

38
00:01:52,060 --> 00:01:53,040
I had a look.

39
00:01:53,300 --> 00:01:57,420
And it just reminded me that this
is a remarkably common issue

40
00:01:57,960 --> 00:02:02,740
for beginners for a bunch of reasons,
but even for some quite

41
00:02:02,920 --> 00:02:08,560
expert database users, sometimes
get confused as to why they've

42
00:02:08,560 --> 00:02:11,580
got an index that Postgres should
be using or they think they

43
00:02:11,580 --> 00:02:12,080
do.

44
00:02:13,820 --> 00:02:18,280
And for some reason, Postgres either
can't use it, and they're

45
00:02:18,280 --> 00:02:21,600
expecting it to be able to, or
doesn't think it will be faster

46
00:02:21,600 --> 00:02:23,340
and is choosing a different plan.

47
00:02:23,920 --> 00:02:26,180
So, yeah, I've seen this a bunch
of times.

48
00:02:26,520 --> 00:02:29,620
I feel like there's probably about
10 different reasons it could

49
00:02:29,620 --> 00:02:32,760
be, which is I think slightly,
like, people don't realize quite

50
00:02:32,760 --> 00:02:34,600
how many reasons there could be
for this.

51
00:02:34,600 --> 00:02:36,420
So I thought it might make an interesting
discussion.

52
00:02:37,120 --> 00:02:39,740
Nikolay: Yeah, well, yeah, I remember
this struggle.

53
00:02:40,680 --> 00:02:44,760
Now I know how to troubleshoot
it, but I remember the struggle

54
00:02:44,760 --> 00:02:46,760
I had many times in the past.

55
00:02:47,680 --> 00:02:51,980
It's not easy sometimes to understand
why a planner behaves in

56
00:02:51,980 --> 00:02:53,960
one way, not as you expected.

57
00:02:54,520 --> 00:02:58,600
And sometimes you just, it upsets
you a lot and you start thinking,

58
00:02:58,600 --> 00:02:59,940
oh, it's stupid.

59
00:03:00,720 --> 00:03:06,780
I saw also people started blaming
Postgres a lot after such events.

60
00:03:06,780 --> 00:03:09,780
They think like, oh, planner is
like, I want control.

61
00:03:09,800 --> 00:03:14,240
Also, this is where you probably
want to start wanting this hinting

62
00:03:14,540 --> 00:03:16,100
which Postgres lacks, right?

63
00:03:16,120 --> 00:03:18,400
Well, it has it, but it's not like
standard.

64
00:03:18,940 --> 00:03:21,360
Michael: Yeah, I think we probably
will get to that.

65
00:03:21,500 --> 00:03:24,780
You said you now know how to solve
it out of interest.

66
00:03:24,780 --> 00:03:26,420
What's the first thing you'll do?

67
00:03:26,600 --> 00:03:29,540
Nikolay: Well, first thing I'll
do, I'll try to understand, okay,

68
00:03:30,140 --> 00:03:34,640
This is the moment, usually I don't
pay attention a lot, but

69
00:03:34,640 --> 00:03:37,760
this is the exact moment when I
start paying attention to cost

70
00:03:38,000 --> 00:03:41,180
metrics in the explain analyze
buffers plan.

71
00:03:41,480 --> 00:03:45,560
So the idea is I try to understand,
is it really a huge number

72
00:03:45,560 --> 00:03:49,280
or like it's relatively small or
tiny?

73
00:03:49,280 --> 00:03:53,140
And then I think, okay, based on
my experience, and here actually

74
00:03:53,140 --> 00:03:55,360
it helps if you have some experience,
right?

75
00:03:55,840 --> 00:04:01,960
Based on my experience, if my index
was used, what would be this

76
00:04:01,960 --> 00:04:02,940
cost, right?

77
00:04:03,300 --> 00:04:06,640
And then, like, basically I already
explained my algorithm.

78
00:04:07,340 --> 00:04:09,840
So, okay, I see some cost.

79
00:04:10,600 --> 00:04:14,400
I wanted to say actual cost, but
the word actual we must reserve

80
00:04:15,100 --> 00:04:17,320
for execution, not for planning,
right?

81
00:04:17,360 --> 00:04:23,440
But this is the cost you see in
the root of this tree of query

82
00:04:23,440 --> 00:04:23,940
execution.

83
00:04:24,240 --> 00:04:25,960
Actually, we don't need to analyze
here, right?

84
00:04:25,960 --> 00:04:28,880
We just say, explain and it's enough.

85
00:04:29,060 --> 00:04:31,120
Michael: For the total cost that
you're talking about, I guess

86
00:04:31,120 --> 00:04:33,760
you're talking about, they're called
startup cost and total cost.

87
00:04:33,760 --> 00:04:37,040
I guess you're talking about the
total cost on the root node.

88
00:04:37,040 --> 00:04:37,960
Nikolay: In the root, yeah, yeah.

89
00:04:37,960 --> 00:04:42,180
We just, like, we have the plan,
we see our index we hoped would

90
00:04:42,180 --> 00:04:45,040
be used, it's not present there,
and we see different index or

91
00:04:45,040 --> 00:04:47,940
maybe sequential scan, and we think,
oh, what's happening here?

92
00:04:48,340 --> 00:04:52,120
And so I see the total cost, probably
also check the successive

93
00:04:52,200 --> 00:04:56,600
method node where data access is
happening, and the cost there

94
00:04:56,600 --> 00:04:57,040
as well.

95
00:04:57,040 --> 00:04:58,040
It's also important.

96
00:04:58,040 --> 00:05:01,260
And then I think, oh, if my index
was used, what would be the

97
00:05:01,260 --> 00:05:01,760
cost?

98
00:05:01,920 --> 00:05:06,060
And since I'm experienced, I'm
sometimes already understanding

99
00:05:06,060 --> 00:05:06,880
what's happening.

100
00:05:07,060 --> 00:05:13,400
But if I have doubts, I just start
playing with these rough knobs.

101
00:05:15,060 --> 00:05:20,280
enable seq scan, enable index scan,
enable blah blah, to put

102
00:05:20,280 --> 00:05:24,520
penalty on specific access methods,
right?

103
00:05:24,660 --> 00:05:28,860
Or so, to specific steps.

104
00:05:30,020 --> 00:05:31,520
Michael: Yeah, scan types.

105
00:05:31,680 --> 00:05:33,400
Nikolay: Yeah, Right, scan types.

106
00:05:33,480 --> 00:05:37,840
And in this case, very often I
start seeing what I expected and

107
00:05:37,840 --> 00:05:40,780
I see, oh, like, this would be
the cost.

108
00:05:41,760 --> 00:05:42,980
And I see the difference.

109
00:05:42,980 --> 00:05:45,140
And sometimes it's very small.

110
00:05:45,620 --> 00:05:49,540
So we were very close to choose
my plan, but it was not chosen

111
00:05:49,540 --> 00:05:50,360
just because...

112
00:05:50,820 --> 00:05:56,400
And then super often, very often,
it's a new system and I ask

113
00:05:56,400 --> 00:05:59,340
people, what random page cost
do you have?

114
00:06:01,060 --> 00:06:04,260
Because sequential scan, our index
is not used.

115
00:06:04,460 --> 00:06:06,440
Oh, and the random page cost is
4.

116
00:06:06,580 --> 00:06:11,740
Recently, some Amazon guys, they
got consultation with me and

117
00:06:12,540 --> 00:06:13,740
this is exactly what happened.

118
00:06:13,740 --> 00:06:15,540
I asked what are the random page
costs?

119
00:06:15,540 --> 00:06:16,460
They said 4.

120
00:06:17,780 --> 00:06:21,740
Remember, Crunchy Bridge, after
listening to us, changed it to

121
00:06:21,740 --> 00:06:26,140
1.1, and they also published good
benchmarks, and it's still

122
00:06:26,140 --> 00:06:30,060
in my to-do to revisit those benchmarks
because I had a tendency

123
00:06:30,060 --> 00:06:35,500
to set random page cost to 1, but
they said 1.1 is better, actually.

124
00:06:35,500 --> 00:06:38,660
According to our benchmarks, and
they had some methodology interesting,

125
00:06:38,680 --> 00:06:40,420
but it's slightly off topic.

126
00:06:40,440 --> 00:06:44,700
So, for those listeners who don't
know what random page cost

127
00:06:45,160 --> 00:06:51,940
is, is how the planner thinks about
random access to data.

128
00:06:52,640 --> 00:07:00,040
And by default, it expects you
using magnetic disks, rotational

129
00:07:00,220 --> 00:07:00,720
disks.

130
00:07:01,400 --> 00:07:04,700
And this is not normal in 2024,
definitely.

131
00:07:05,380 --> 00:07:10,340
And I don't know if Aurora and
RDS still have 4.

132
00:07:10,760 --> 00:07:14,540
That cluster may be created long
ago, but if they still do have

133
00:07:14,540 --> 00:07:16,400
this, This is another question
to them.

134
00:07:16,400 --> 00:07:20,140
We had the question about CPU and
green color in performance

135
00:07:20,180 --> 00:07:21,220
insights last time.

136
00:07:21,220 --> 00:07:23,480
But today we have a new question.

137
00:07:24,060 --> 00:07:25,020
We need to check.

138
00:07:25,080 --> 00:07:27,940
I haven't created clusters for
long.

139
00:07:27,940 --> 00:07:28,440
Yeah.

140
00:07:28,940 --> 00:07:31,160
Michael: Last time I checked, most
of them were.

141
00:07:31,160 --> 00:07:34,400
There were a couple of good exceptions
like ScaleGrid and now

142
00:07:34,400 --> 00:07:37,800
Crunchy Bridge but there aren't
many that have tuned it last

143
00:07:37,800 --> 00:07:40,360
time I checked which was admittedly
quite a while ago.

144
00:07:40,920 --> 00:07:43,260
Nikolay: This is super strange
and also Postgres community.

145
00:07:43,260 --> 00:07:47,960
It's time to revisit this and we
need this for mostly for new

146
00:07:47,960 --> 00:07:48,460
clusters.

147
00:07:49,020 --> 00:07:51,980
You shouldn't think too much about
old clusters, they already

148
00:07:51,980 --> 00:07:55,140
have postgresql.conf in place and
everything there.

149
00:07:55,140 --> 00:07:56,420
So, change it.

150
00:07:57,340 --> 00:08:01,080
Michael: And I would say, why are
we optimizing for the few people

151
00:08:01,080 --> 00:08:05,400
that are still running magnetic
disks instead of the vast majority

152
00:08:05,580 --> 00:08:06,800
who are running on SSDs.

153
00:08:07,660 --> 00:08:10,020
It's time to change, I think, for
the default.

154
00:08:11,840 --> 00:08:15,380
Nikolay: I might be, again, taking
too much time for a passenger,

155
00:08:15,380 --> 00:08:19,000
but let me just, I just feel this
that maybe some people don't

156
00:08:19,000 --> 00:08:20,500
understand what we are talking
about.

157
00:08:20,500 --> 00:08:21,780
I will be very short.

158
00:08:21,780 --> 00:08:24,520
So there is a seqpage cost and
random page cost.

159
00:08:24,520 --> 00:08:28,640
Seqpage cost means sequential,
the cost of sequential data access.

160
00:08:28,900 --> 00:08:30,240
And it's 1.

161
00:08:30,240 --> 00:08:32,780
If you change it, you change scale.

162
00:08:33,200 --> 00:08:34,780
I have cases when people change
it.

163
00:08:34,780 --> 00:08:39,640
But it's 1, it's our baseline.

164
00:08:40,440 --> 00:08:41,180
It's 1.

165
00:08:41,180 --> 00:08:44,200
And random page cost by default
in Postgres is 4.

166
00:08:45,060 --> 00:08:49,340
It expects that random data access
is 4 times more expensive

167
00:08:49,440 --> 00:08:52,720
than sequential page access, which
is not so if you, for example,

168
00:08:52,720 --> 00:08:58,740
have all data fit in memory, cached,
or if you have, if it doesn't

169
00:08:58,740 --> 00:09:05,860
fit, but if you have non-rotational
disks, SSD, NVMe SSD, or

170
00:09:05,860 --> 00:09:07,920
something like that, modern disks.

171
00:09:07,920 --> 00:09:11,140
And most of new databases have
modern disks, of course.

172
00:09:11,140 --> 00:09:13,980
And Aurora, I think they have very
good storage, so obviously

173
00:09:14,340 --> 00:09:16,010
it's not rotational.

174
00:09:16,440 --> 00:09:18,900
I think maybe I'm wrong, actually,
right?

175
00:09:19,440 --> 00:09:20,640
Should not be rotational.

176
00:09:20,640 --> 00:09:21,420
It's very slow.

177
00:09:21,420 --> 00:09:23,100
I mean, throughput is terrible,
right?

178
00:09:23,100 --> 00:09:23,600
Okay.

179
00:09:25,360 --> 00:09:26,600
So, and why does it matter here?

180
00:09:26,600 --> 00:09:31,520
Because if you have 1 and 4 default
settings, 1 sequential and

181
00:09:31,520 --> 00:09:34,580
4 random access, it's 4 times more
expensive.

182
00:09:34,640 --> 00:09:38,860
The planner quite often thinks,
oh, sequential scan is not that

183
00:09:38,860 --> 00:09:39,360
bad.

184
00:09:39,560 --> 00:09:43,820
I would prefer sequential scan
over index access if I need to

185
00:09:43,820 --> 00:09:48,620
fetch a lot of rows, because it
seems to be cheaper for me.

186
00:09:48,960 --> 00:09:54,280
Once you shift it to normal 1.1,
1.1 is normal these days.

187
00:09:54,440 --> 00:09:56,740
This is common understanding.

188
00:09:57,620 --> 00:09:59,820
And Crunchy Bridge have good benchmarks.

189
00:09:59,860 --> 00:10:01,160
Let's attach the link.

190
00:10:01,220 --> 00:10:04,340
So in this case, you tell the planner,
actually, sequential access,

191
00:10:04,340 --> 00:10:06,240
random access, they're almost the
same.

192
00:10:06,900 --> 00:10:08,960
And index scan starts winning.

193
00:10:10,440 --> 00:10:10,920
Yeah.

194
00:10:10,920 --> 00:10:11,080
Michael: Yeah.

195
00:10:11,080 --> 00:10:15,560
Or at least it's only a 10% penalty
instead of a 300% penalty,

196
00:10:15,720 --> 00:10:19,280
which is a huge difference once
you start returning a bunch of

197
00:10:19,280 --> 00:10:21,240
a lot of rows for a single.

198
00:10:21,600 --> 00:10:21,960
Yeah.

199
00:10:21,960 --> 00:10:23,080
Which happens quite often.

200
00:10:23,080 --> 00:10:23,860
There's quite a lot.

201
00:10:23,860 --> 00:10:26,640
Like, anyway, I'm very happy for
you to do a bunch of talking

202
00:10:26,640 --> 00:10:26,820
here.

203
00:10:26,820 --> 00:10:28,200
It's a pretty common topic.

204
00:10:28,200 --> 00:10:31,280
I like that we've gone this deep
so quickly, but I do want to

205
00:10:31,280 --> 00:10:34,680
go, I think it's worth going back
and saying, this is a great

206
00:10:34,680 --> 00:10:38,640
avenue for like, like you mentioned
looking at the costs, but

207
00:10:38,640 --> 00:10:42,160
I think there's an easy, like even
for non-experts, you can,

208
00:10:42,160 --> 00:10:43,940
you can use that to your advantage,
right?

209
00:10:43,940 --> 00:10:47,120
You can, if you have access, I
did, I didn't, when I was looking

210
00:10:47,120 --> 00:10:48,280
into Brent's problem.

211
00:10:48,280 --> 00:10:52,960
It was a read only user, so I couldn't
do enable set scan off,

212
00:10:52,960 --> 00:10:56,520
which is the easiest way if you're
getting a sequential scan,

213
00:10:56,820 --> 00:11:01,840
flipping it to, well, basically
telling the planner that sequential

214
00:11:01,840 --> 00:11:05,240
access is tons more expensive than
it really is, and therefore,

215
00:11:05,540 --> 00:11:08,760
if it has any ability to use the
index, no matter how expensive

216
00:11:08,760 --> 00:11:11,120
it should be, it will then use
it.

217
00:11:11,120 --> 00:11:15,060
If it can't, though, enableSeqScanOff
doesn't actually prevent

218
00:11:15,060 --> 00:11:16,700
the planner using seq scans.

219
00:11:17,280 --> 00:11:18,560
It just penalizes it.

220
00:11:18,560 --> 00:11:20,240
It makes it look more expensive.

221
00:11:20,460 --> 00:11:24,660
So if you cannot use the index,
which is a bunch of these cases

222
00:11:25,080 --> 00:11:27,540
are Postgres can't use the index
for some reason.

223
00:11:27,840 --> 00:11:30,460
Or you've forgotten that you're
in an environment that doesn't

224
00:11:30,460 --> 00:11:31,360
have the index.

225
00:11:31,360 --> 00:11:33,440
Like that is a surprisingly common

226
00:11:33,620 --> 00:11:34,840
Nikolay: and valid index.

227
00:11:35,380 --> 00:11:35,640
Michael: Yeah.

228
00:11:35,640 --> 00:11:36,340
Well, Yeah.

229
00:11:36,340 --> 00:11:38,320
I haven't seen that 1 as often.

230
00:11:38,380 --> 00:11:40,960
Like I haven't seen it come up
as often.

231
00:11:40,960 --> 00:11:43,860
But quite often people are in the
middle of debugging something

232
00:11:43,860 --> 00:11:46,440
and someone's dropped that index
or they're on staging and it

233
00:11:46,440 --> 00:11:49,660
doesn't have it when they went
on production, they do have it.

234
00:11:49,840 --> 00:11:53,800
So it is worth double checking
the index exists, but by using

235
00:11:53,800 --> 00:11:59,240
enable seq scan off, you can quickly
check, is it a costing issue?

236
00:11:59,240 --> 00:12:02,860
Now it's trickier with if you're
in the case you mentioned where

237
00:12:03,420 --> 00:12:07,000
it's using 1 index and not the
1 you're expecting, you can't

238
00:12:07,000 --> 00:12:07,740
use that.

239
00:12:08,680 --> 00:12:12,080
You can disable different scan
types, but you can't tell it not

240
00:12:12,080 --> 00:12:13,480
to use a specific index.

241
00:12:13,740 --> 00:12:16,800
I have seen a trick for that though,
which I wasn't familiar

242
00:12:16,800 --> 00:12:19,620
with before, but Haki Benita shared
on a blog post.

243
00:12:19,960 --> 00:12:25,160
He does a begin transaction, drop
index, the 1 that it was using,

244
00:12:25,760 --> 00:12:30,420
explain the query, and then roll
back, which means you can try

245
00:12:30,420 --> 00:12:34,100
and see if that index didn't exist,
would it pick my index?

246
00:12:34,120 --> 00:12:35,740
Which is quite a nice trick as
well.

247
00:12:35,740 --> 00:12:39,220
So trying to find out, are you
in the case where Postgres can't

248
00:12:39,220 --> 00:12:42,320
use the index or you're in the
case where it's choosing not to?

249
00:12:42,380 --> 00:12:45,640
The enable parameter is a really
nice way of finding that out

250
00:12:45,640 --> 00:12:46,820
quite quickly normally.

251
00:12:46,960 --> 00:12:50,580
So yeah, cool that you start there
as well, even if you're eyeballing

252
00:12:50,680 --> 00:12:51,380
the costs.

253
00:12:51,380 --> 00:12:54,960
The one thing I would add on that
is, I think explain analyze can

254
00:12:54,960 --> 00:12:55,460
be helpful.

255
00:12:55,460 --> 00:12:58,520
So if you've run that already,
or if your query actually runs

256
00:12:58,520 --> 00:13:03,260
and doesn't time out, then the
explain analyze part is helpful

257
00:13:03,260 --> 00:13:07,640
because you get the rows returned,
and I think when the cost estimates

258
00:13:07,640 --> 00:13:10,900
are off, sometimes it's random page cost, but quite

259
00:13:10,900 --> 00:13:15,040
often it's about the number of
rows Postgres expects to be returned

260
00:13:15,040 --> 00:13:17,320
and the number of rows that are
actually returned.

261
00:13:17,320 --> 00:13:20,140
So, in fact, maybe you don't
need to analyze because if

262
00:13:20,140 --> 00:13:22,580
you know the query, you know the
data, you know the query you're

263
00:13:22,580 --> 00:13:26,040
running, if you're expecting only
a few rows to be returned and

264
00:13:26,040 --> 00:13:29,860
you see hundreds of thousands in
the estimate, you've got a pretty

265
00:13:29,860 --> 00:13:32,940
good clue right there that you're
in a case where it's overestimating

266
00:13:33,200 --> 00:13:34,380
the cost of that.

267
00:13:34,400 --> 00:13:35,860
Nikolay: Or underestimating maybe.

268
00:13:35,860 --> 00:13:41,200
But yeah, underestimating wouldn't
lead to the index not being used,

269
00:13:41,200 --> 00:13:43,140
but anyway, it can lead to wrong
plans.

270
00:13:43,140 --> 00:13:45,060
And yeah, that's a good point.

271
00:13:45,060 --> 00:13:47,460
So, this is the reason number 1,
let's say.

272
00:13:47,660 --> 00:13:50,860
First reason is like costs are
slightly different.

273
00:13:50,860 --> 00:13:54,780
But this is already talking
about the most difficult

274
00:13:54,860 --> 00:13:55,880
case, probably.

275
00:13:56,320 --> 00:13:57,740
There are easier cases.

276
00:13:57,740 --> 00:14:00,640
I just excluded them from my mind
because they are easy.

277
00:14:02,520 --> 00:14:06,380
If the cost is off, and indeed, we
check the planned rows and actual

278
00:14:06,380 --> 00:14:12,700
rows, if the mismatch is huge,
either it's outdated stats, probably

279
00:14:12,800 --> 00:14:15,720
somebody is blocking the autovacuum
or something we need to check,

280
00:14:15,720 --> 00:14:18,380
or maybe lack of stats.

281
00:14:19,960 --> 00:14:22,060
Sometimes we don't have stats at
all.

282
00:14:22,800 --> 00:14:23,540
It happens.

283
00:14:23,720 --> 00:14:26,840
For example, if you created a functional
index but you didn't

284
00:14:26,840 --> 00:14:28,640
run analyze at all yet.

285
00:14:29,240 --> 00:14:30,560
Michael: That was the case yesterday.

286
00:14:31,260 --> 00:14:35,280
Nikolay: In this case, explain
analyze buffers also is good.

287
00:14:35,280 --> 00:14:36,400
I agree with you.

288
00:14:36,680 --> 00:14:40,760
But sometimes you have
a query which lasts hours.

289
00:14:40,760 --> 00:14:45,360
So, in this case, we need to
downgrade to just a regular explain.

290
00:14:47,940 --> 00:14:51,360
Michael: Well, this is one of those,
like, I'm obviously a big fan

291
00:14:51,460 --> 00:14:55,440
of explaining those buffers like
you, but this is one of those

292
00:14:55,440 --> 00:14:58,580
cases where I don't think you can
know, like, maybe you can never

293
00:14:58,580 --> 00:15:01,400
know, but I don't think you can
know from a single execution

294
00:15:01,400 --> 00:15:05,000
plan which case you're in, we often
need at least 2.

295
00:15:05,000 --> 00:15:08,340
We need the before and the after
we've changed one of these parameters.

296
00:15:08,500 --> 00:15:10,240
The first one is not going to tell
us that much.

297
00:15:10,240 --> 00:15:13,040
It tells us it's not using the
index, but it doesn't tell us

298
00:15:13,040 --> 00:15:14,440
why it's not using the index.

299
00:15:14,440 --> 00:15:18,060
We could get some clues, like row
estimates being off or like

300
00:15:18,060 --> 00:15:21,520
the cost number, but they're only
they're only clues like we've

301
00:15:21,660 --> 00:15:24,600
only by changing something and
running it again that we can see

302
00:15:24,600 --> 00:15:27,280
the difference of when it
is using the index.

303
00:15:28,520 --> 00:15:31,260
Nikolay: That's why database branching
and experimentation with

304
00:15:31,260 --> 00:15:35,640
iterations matters so much because
if you already calculated the

305
00:15:35,640 --> 00:15:39,720
statistics, but then you start
having questions, what if I did

306
00:15:39,720 --> 00:15:41,420
something different?

307
00:15:41,940 --> 00:15:46,380
You want to reset and go a different
route, different path, right?

308
00:15:46,440 --> 00:15:50,880
In this case, being able to iterate,
like reset in a few seconds,

309
00:15:51,020 --> 00:15:54,820
run it again, check this idea,
check that idea.

310
00:15:55,160 --> 00:16:01,500
That's why branching and fast cloning
matters so much, right?

311
00:16:01,620 --> 00:16:02,080
Exactly.

312
00:16:02,080 --> 00:16:07,280
This is, yeah, because otherwise it's a one-way
ticket.

313
00:16:07,280 --> 00:16:07,780
ticket.

314
00:16:08,360 --> 00:16:11,340
So you already calculated statistics
and that's it.

315
00:16:11,820 --> 00:16:12,260
Michael: Yeah, true.

316
00:16:12,260 --> 00:16:13,440
Actually, you can't go back.

317
00:16:13,440 --> 00:16:15,900
You can solve your problem, but
you can't go back.

318
00:16:17,140 --> 00:16:19,680
Nikolay: And sometimes you, you,
okay, you solve the problem,

319
00:16:19,680 --> 00:16:22,860
but you cannot explain in detail
what happened.

320
00:16:24,520 --> 00:16:28,340
But this doesn't build confidence
in your team if you cannot

321
00:16:28,340 --> 00:16:29,280
explain, right?

322
00:16:29,640 --> 00:16:31,620
And to explain, you need to go
back.

323
00:16:32,200 --> 00:16:32,700
Yeah.

324
00:16:32,780 --> 00:16:33,840
And understand better.

325
00:16:33,840 --> 00:16:34,200
Michael: Yeah.

326
00:16:34,200 --> 00:16:37,000
I like that you're using explain
in a different context now.

327
00:16:37,400 --> 00:16:37,900
Nikolay: Okay.

328
00:16:38,040 --> 00:16:39,280
But it's related, right?

329
00:16:39,280 --> 00:16:39,960
It's related.

330
00:16:39,960 --> 00:16:40,440
Michael: It is.

331
00:16:40,440 --> 00:16:41,320
It really is.

332
00:16:41,320 --> 00:16:44,060
And then you can analyze what you're
going to do next.

333
00:16:44,060 --> 00:16:44,560
Nikolay: Right.

334
00:16:44,920 --> 00:16:46,780
Analyze what is overused.

335
00:16:47,360 --> 00:16:47,860
Michael: Yeah.

336
00:16:48,280 --> 00:16:50,660
Well, that's exactly what happened
yesterday as well.

337
00:16:50,980 --> 00:16:53,500
I didn't well, there's an interesting
thing here.

338
00:16:53,680 --> 00:16:55,580
I jumped to conclusions a little
bit.

339
00:16:55,580 --> 00:16:58,940
I did spot the it was it was probably
an estimation error, but

340
00:16:58,940 --> 00:17:03,220
I thought it might also be a data
type casting issue, which is

341
00:17:03,220 --> 00:17:06,040
the kind of thing that we didn't
cover that in depth, actually.

342
00:17:06,500 --> 00:17:07,367
We covered it in my

343
00:17:07,367 --> 00:17:08,220
Nikolay: Tom's
mismatched query, right?

344
00:17:08,220 --> 00:17:10,700
Let's call it mismatched query,
maybe.

345
00:17:10,920 --> 00:17:11,680
Michael: Tim, yeah.

346
00:17:11,680 --> 00:17:18,000
So, basically, if your query can't
be answered by the index you

347
00:17:18,000 --> 00:17:18,440
have.

348
00:17:18,440 --> 00:17:21,900
So, like, the typical example of
this is a function.

349
00:17:21,900 --> 00:17:26,420
So, for example, if you're running
lower on an expression

350
00:17:26,840 --> 00:17:29,880
on a field, but you have only the
field index.

351
00:17:30,060 --> 00:17:32,720
And you might think logically that
it could use the index.

352
00:17:32,720 --> 00:17:34,200
If, for example, in your...

353
00:17:34,200 --> 00:17:34,700
Date.

354
00:17:35,660 --> 00:17:36,160
Nikolay: Huh?

355
00:17:36,260 --> 00:17:39,820
From timestamp to date, your reduction
from timestamp to date,

356
00:17:39,820 --> 00:17:41,420
for example, it's very common.

357
00:17:41,460 --> 00:17:45,420
You convert timestamp to date and
expect that an index on timestamp

358
00:17:45,460 --> 00:17:46,020
will work.

359
00:17:46,020 --> 00:17:46,680
It won't.

360
00:17:47,020 --> 00:17:47,360
Right?

361
00:17:47,360 --> 00:17:52,260
Because it's for timestamp, not
for your expression.

362
00:17:53,100 --> 00:17:53,560
Michael: So, yeah.

363
00:17:53,560 --> 00:17:57,040
So, in some cases, Postgres handles
some of these data type conversions,

364
00:17:57,100 --> 00:18:00,740
like text to varchar, which I wasn't
sure about yesterday.

365
00:18:01,420 --> 00:18:02,860
But in some cases, it doesn't.

366
00:18:02,860 --> 00:18:05,120
So, it depends on whether it supports
that.

367
00:18:05,140 --> 00:18:08,480
But in general, that's the kind
of thing that can prevent the

368
00:18:08,480 --> 00:18:12,740
use of an index that you're expecting
to be used, or like the

369
00:18:12,740 --> 00:18:14,240
operator not being supported.

370
00:18:15,380 --> 00:18:19,280
There's other cases where the index
type you, so let's say you've

371
00:18:19,280 --> 00:18:21,780
used a B-tree because that's what
most of us are using most of

372
00:18:21,780 --> 00:18:22,460
the time.

373
00:18:23,000 --> 00:18:26,340
One example is on a text field using
ILIKE.

374
00:18:26,400 --> 00:18:30,040
So like again, a case insensitive
search operator is not going

375
00:18:30,040 --> 00:18:30,700
to use.

376
00:18:31,020 --> 00:18:34,200
The B-tree index only supports
certain operators.

377
00:18:34,200 --> 00:18:37,400
And if you're using an operator
that it doesn't support, like

378
00:18:37,540 --> 00:18:41,140
a greater than on a hash index,
for example, or anything other

379
00:18:41,140 --> 00:18:45,060
than equality on a hash index,
it won't use the index, kind of

380
00:18:45,060 --> 00:18:45,560
obviously.

381
00:18:45,860 --> 00:18:49,440
So I think there's a few cases
that are quite simple for why

382
00:18:49,440 --> 00:18:52,200
it can't use the index and then
there's a bunch that are a bit

383
00:18:52,200 --> 00:18:54,860
more complicated along the lines
you were talking about which

384
00:18:54,860 --> 00:18:59,640
is what I've just classified as
Postgres doesn't think it will

385
00:18:59,640 --> 00:19:00,400
be faster.

386
00:19:00,600 --> 00:19:04,120
So yeah, we've said expensive and
cheap a few times, but those

387
00:19:04,120 --> 00:19:08,100
costs, while they're in an arbitrary
unit, the idea is for them

388
00:19:08,100 --> 00:19:13,940
to estimate how fast the query
will be, or how slow it will be.

389
00:19:14,340 --> 00:19:16,200
The higher the cost, the slower
it would be.

390
00:19:16,200 --> 00:19:17,220
That's the idea.

391
00:19:18,040 --> 00:19:21,600
It's not the only measure that
Postgres could have tried to optimize

392
00:19:21,600 --> 00:19:21,880
for.

393
00:19:21,880 --> 00:19:23,720
It could have tried to optimize
for IO.

394
00:19:24,520 --> 00:19:25,820
And that's correlated.

395
00:19:25,920 --> 00:19:27,340
But it's not the same thing.

396
00:19:27,440 --> 00:19:29,340
It optimizes for speed.

397
00:19:30,060 --> 00:19:35,160
Which is, yeah, interesting and
kind of leads us to what could

398
00:19:35,160 --> 00:19:35,740
have gone wrong.

399
00:19:35,740 --> 00:19:39,020
So I actually think we didn't cover
a whole case, though, like

400
00:19:39,020 --> 00:19:43,020
a whole simple case of it doesn't
think it would be faster, and

401
00:19:43,020 --> 00:19:43,820
It's correct.

402
00:19:45,040 --> 00:19:47,040
That's a case that catches people
out a bunch.

403
00:19:47,040 --> 00:19:50,920
Like you're trying to force it
to use an index, but it's doing

404
00:19:50,920 --> 00:19:52,260
the faster route already.

405
00:19:52,540 --> 00:19:58,680
Nikolay: For example, if you read
95% of your table, just a sequential

406
00:19:58,680 --> 00:20:03,160
scan might be much faster on the
fly filtering out those 5% that

407
00:20:03,160 --> 00:20:08,420
are not needed, than walking on
the tree, the B-tree, right? It's

408
00:20:08,420 --> 00:20:09,480
it will be probably...

409
00:20:09,600 --> 00:20:14,560
Even if you balance random page
cost and index scan cost,

410
00:20:14,940 --> 00:20:19,840
it still might be better to use
a sequential scan in reality.

411
00:20:19,960 --> 00:20:20,460
So,

412
00:20:20,580 --> 00:20:21,540
Michael: Yeah, for sure.

413
00:20:21,540 --> 00:20:25,240
And because of how sequential scans
work, because of how efficient

414
00:20:25,240 --> 00:20:27,980
they are, and the fact they don't
need to read data from multiple

415
00:20:27,980 --> 00:20:28,480
places.

416
00:20:29,280 --> 00:20:34,120
And if your data fits really tightly
on like very few pages it

417
00:20:34,120 --> 00:20:38,040
could be a lot faster and not even
not even just at percentages

418
00:20:38,200 --> 00:20:44,380
as high as 95% to be honest. Like for
for small tables, for some with

419
00:20:44,380 --> 00:20:49,340
like not very wide rows, I've seen
it be as low as 30% or so,

420
00:20:49,340 --> 00:20:51,660
that's still faster as a sequential
scan.

421
00:20:51,660 --> 00:20:55,180
So I suspect you've come up with
a contrived example of it being

422
00:20:55,180 --> 00:20:55,940
even lower.

423
00:20:56,580 --> 00:20:59,680
Nikolay: Right, so in this case,
I have various, like simple,

424
00:20:59,780 --> 00:21:01,400
very basic questions.

425
00:21:01,860 --> 00:21:05,600
And I think we should, like, in
many cases, we just should start

426
00:21:05,600 --> 00:21:09,740
from this question when we consider
the performance of a query.

427
00:21:09,960 --> 00:21:17,440
We should think how much underlying
data do we have, and next,

428
00:21:17,780 --> 00:21:20,720
very high level, how many rows
return?

429
00:21:21,220 --> 00:21:24,960
Then it can be like, we can unfold
this and say, okay, if it's

430
00:21:24,960 --> 00:21:26,620
just one row, was it an aggregate?

431
00:21:27,040 --> 00:21:31,600
So we need to analyze a lot of
rows for real, or just we return...

432
00:21:32,380 --> 00:21:36,900
Last week I had a case, people
complained about very slow, like

433
00:21:36,900 --> 00:21:41,020
very bad behavior and they used
explain with buffers, they saw

434
00:21:41,020 --> 00:21:44,440
like some gigabytes of data and
so on.

435
00:21:45,060 --> 00:21:49,740
And they knew, ORM or GraphQL was
involved, I don't remember,

436
00:21:49,740 --> 00:21:53,300
but they knew that they need only
like not a lot of rows.

437
00:21:54,180 --> 00:21:58,280
And then we just saw that, like,
oh, it's bad, it's bad, it's

438
00:21:58,280 --> 00:21:58,500
bad.

439
00:21:58,500 --> 00:22:01,820
And the question, how many rows
are we actually returning?

440
00:22:01,820 --> 00:22:04,300
And they also like, we created
all the indexes.

441
00:22:04,640 --> 00:22:08,600
The indexes are here, but somehow
it's still sequential scans.

442
00:22:09,120 --> 00:22:09,960
But how many rows?

443
00:22:09,960 --> 00:22:13,260
And we thought like, oh, actually
it was like 10,000 rows or

444
00:22:13,260 --> 00:22:15,360
so, maybe 100,000 or 25,000 rows.

445
00:22:15,360 --> 00:22:17,380
I don't know, a lot of rows returned.

446
00:22:17,640 --> 00:22:20,260
Do you really need it or did you just
forget the limit?

447
00:22:20,820 --> 00:22:23,560
So the limit is applied on the client side
maybe, right?

448
00:22:23,560 --> 00:22:24,360
It's terrible.

449
00:22:25,160 --> 00:22:30,540
And then I said immediately, like,
I'm a big fan of using, Again,

450
00:22:30,720 --> 00:22:35,000
maybe off-topic, but I'm a big
fan of using, in larger projects,

451
00:22:35,000 --> 00:22:39,300
not in tiny projects when you care
about everything in your pet.

452
00:22:39,900 --> 00:22:43,260
So queries for me, like you know
this concept, pets versus cattle,

453
00:22:43,260 --> 00:22:43,760
right?

454
00:22:44,540 --> 00:22:45,040
Michael: Yeah.

455
00:22:45,200 --> 00:22:48,120
Nikolay: Yeah, so for virtual machines
or for real machines,

456
00:22:48,120 --> 00:22:50,880
for big fleet of infrastructure.

457
00:22:52,060 --> 00:22:56,600
In the case of workload, if the workload
is complex, I'm a big fan

458
00:22:56,600 --> 00:23:00,300
of dealing with queries like with
cattle as well.

459
00:23:00,300 --> 00:23:03,560
So when I saw this, it was a new
client.

460
00:23:03,620 --> 00:23:08,160
When I saw this, that they had
this problem, returning too many

461
00:23:08,160 --> 00:23:11,660
rows unexpectedly, I immediately
said, stop here.

462
00:23:11,680 --> 00:23:13,940
Like, it's just 1 example of the
problem.

463
00:23:13,940 --> 00:23:17,760
And let's return to a high-level,
top-down analysis using pg_stat_statements

464
00:23:17,800 --> 00:23:23,560
because it has rows,
rows metric, and let's see the

465
00:23:23,560 --> 00:23:27,240
whole picture, how many other queries
behave similarly and in

466
00:23:27,240 --> 00:23:28,440
turn too many rows.

467
00:23:28,580 --> 00:23:33,160
But this is exactly when it happens,
an index is not used.

468
00:23:33,420 --> 00:23:37,000
Because you request for too many
rows here, right?

469
00:23:38,040 --> 00:23:40,260
And selectivity is not good.

470
00:23:40,260 --> 00:23:41,400
It's very weak.

471
00:23:42,500 --> 00:23:45,880
Michael: Yeah, and Postgres is
going to do exactly what you asked

472
00:23:45,880 --> 00:23:46,400
it to do.

473
00:23:46,400 --> 00:23:49,540
If you want all of the rows, it's
not going to say, let me give

474
00:23:49,540 --> 00:23:51,180
you the first 25 and see if that's
enough.

475
00:23:51,180 --> 00:23:52,540
It's going to say, nope, here you
go.

476
00:23:52,540 --> 00:23:53,270
Here's all of them.

477
00:23:53,270 --> 00:23:55,140
Nikolay: ROMAN BATURINIKIS Unless
it's a special case, which

478
00:23:55,140 --> 00:23:57,440
is slightly uncertain, right?

479
00:23:57,440 --> 00:23:58,480
It's something new for us.

480
00:23:58,480 --> 00:23:59,120
Michael: Right, good point.

481
00:24:00,060 --> 00:24:04,580
I haven't actually considered the
new index types.

482
00:24:05,380 --> 00:24:05,880
Nikolay: Yeah.

483
00:24:06,140 --> 00:24:06,640
Anyway.

484
00:24:06,760 --> 00:24:09,780
No more certainty in the SQL world.

485
00:24:10,840 --> 00:24:11,340
Michael: Yeah.

486
00:24:11,940 --> 00:24:13,800
But yeah, there's another case
as well.

487
00:24:14,020 --> 00:24:18,260
A big one is if you're selecting
a high proportion of the table,

488
00:24:18,820 --> 00:24:21,680
chances are it's actually faster
to do a sequential scan.

489
00:24:22,160 --> 00:24:26,860
Possibly a different index might
serve your query better.

490
00:24:27,040 --> 00:24:28,780
I haven't seen that one as often.

491
00:24:29,060 --> 00:24:34,700
But another case I've seen really
often is small tables.

492
00:24:35,020 --> 00:24:38,360
So even if you're only selecting
one row in a table that's like

493
00:24:38,360 --> 00:24:41,680
fewer than 100, which might not
be that common in production,

494
00:24:41,680 --> 00:24:43,940
I've seen quite a few like reference
tables though that are not

495
00:24:43,940 --> 00:24:49,080
very many rows, or more commonly
developer databases on people's

496
00:24:49,080 --> 00:24:51,540
local machines where they just
have a tiny bit of data for like

497
00:24:51,540 --> 00:24:52,440
a new feature.

498
00:24:52,740 --> 00:24:56,420
Even if you have the perfect index,
Postgres won't choose to

499
00:24:56,420 --> 00:24:59,160
use it when you don't have much
data just because it's so... 

500
00:24:59,160 --> 00:25:02,160
Because all the data is on one page
and it can just simply look

501
00:25:02,160 --> 00:25:04,780
it up very, very quickly, very,
very easily.

502
00:25:04,940 --> 00:25:07,920
So small tables are the other exception
that I see more often

503
00:25:07,920 --> 00:25:09,180
because of dev boxes.

504
00:25:09,720 --> 00:25:15,920
Nikolay: You know exactly how easy
it can be impatient here trying

505
00:25:15,920 --> 00:25:16,960
to interrupt you, right?

506
00:25:16,960 --> 00:25:18,380
This is my favorite topic.

507
00:25:19,300 --> 00:25:21,680
Probably I will not add anything
here.

508
00:25:22,080 --> 00:25:26,260
And listeners who follow us for
long already know what I would

509
00:25:26,260 --> 00:25:27,720
say here, right?

510
00:25:27,720 --> 00:25:33,140
Because you should always try to
deal with full-size databases.

511
00:25:34,540 --> 00:25:36,380
Michael: Yeah, or at least like
a

512
00:25:36,380 --> 00:25:36,880
Nikolay: bigger,

513
00:25:36,900 --> 00:25:38,080
Michael: large datasets.

514
00:25:38,260 --> 00:25:40,340
If you're going to have to, yeah.

515
00:25:40,440 --> 00:25:41,920
But it trips people up.

516
00:25:42,500 --> 00:25:46,580
Nikolay: Unless you're a big fan
of the hypothetical approach, hyper-pg

517
00:25:46,780 --> 00:25:48,140
and indexed partitioning.

518
00:25:48,580 --> 00:25:51,920
So it also would work, but in a
limited way.

519
00:25:52,700 --> 00:25:53,920
Michael: Would it though?

520
00:25:53,920 --> 00:25:56,480
Because if you don't have any data,
I don't think...

521
00:25:56,480 --> 00:25:57,380
Nikolay: Ah, no, sorry.

522
00:25:57,800 --> 00:25:59,720
It was only discussed, not implemented.

523
00:25:59,760 --> 00:26:04,020
And there was another project,
which like, let's export statistics

524
00:26:04,160 --> 00:26:08,400
from production, import it to a lower
environment, and pretend

525
00:26:08,400 --> 00:26:09,640
we have a lot of data.

526
00:26:09,640 --> 00:26:11,100
Michael: That's a cool idea.

527
00:26:11,940 --> 00:26:13,020
Nikolay: Yes, it's a cool idea.

528
00:26:13,200 --> 00:26:14,040
Maybe already...

529
00:26:15,920 --> 00:26:20,360
Hyper-pg has been developed, so
definitely partitioning was covered,

530
00:26:22,360 --> 00:26:25,280
but about statistics maybe not,
but there was another project

531
00:26:26,000 --> 00:26:30,700
developed in Japan, I guess, and
it's not super popular, but

532
00:26:30,700 --> 00:26:35,340
the idea is cool, I think it's really
good, but I expect some limitations

533
00:26:35,420 --> 00:26:40,020
of this idea obviously if you can
afford testing on full-size

534
00:26:40,080 --> 00:26:44,340
databases this is the best way
yeah you just need to make it

535
00:26:44,340 --> 00:26:48,020
cheap and then here I stop yeah

536
00:26:50,940 --> 00:26:54,060
Michael: but yeah this is a so
this is the more interesting case.

537
00:26:54,060 --> 00:26:58,000
So right, like once you're past
those two of it being, you know,

538
00:26:58,120 --> 00:27:01,080
actually faster, if it's wrong,
if Postgres is not estimating

539
00:27:01,100 --> 00:27:03,340
correctly, That's when we get into
the more interesting parts

540
00:27:03,340 --> 00:27:08,040
of, like, you mentioned stale statistics
or not even having statistics

541
00:27:08,940 --> 00:27:10,860
that are relevant or needed.

542
00:27:11,180 --> 00:27:14,640
So analyze is your friend here
on the tables in question.

543
00:27:15,040 --> 00:27:16,740
We've also got a couple of other
tools.

544
00:27:18,620 --> 00:27:22,360
We can increase the sample that
Postgres will do, if you've got

545
00:27:22,360 --> 00:27:23,500
an oddly distributed-

546
00:27:23,600 --> 00:27:25,120
Nikolay: Default statistics target.

547
00:27:25,400 --> 00:27:26,140
Yeah, some people-

548
00:27:26,140 --> 00:27:26,880
Michael: Not even-

549
00:27:27,740 --> 00:27:28,760
Nikolay: For one column.

550
00:27:29,540 --> 00:27:30,060
Michael: Yeah, exactly.

551
00:27:30,060 --> 00:27:32,500
You can do it globally, but per
column...

552
00:27:32,500 --> 00:27:35,720
Nikolay: But 100, default 100 is
also quite a lot.

553
00:27:35,900 --> 00:27:40,040
It might not be enough if you have
high insert rate and constantly

554
00:27:40,040 --> 00:27:46,080
working on the edge of new data
and the distribution of values

555
00:27:46,080 --> 00:27:49,940
in new data is very different compared
to the archived data.

556
00:27:50,380 --> 00:27:53,500
So there are things there, definitely.

557
00:27:54,000 --> 00:27:56,460
Michael: That's one of the defaults
I'm happiest with, actually.

558
00:27:56,760 --> 00:28:00,300
I think it's a pretty good sweet
spot because increasing it,

559
00:28:00,300 --> 00:28:03,340
if you increase it globally, you
increase the time for analyze

560
00:28:03,340 --> 00:28:04,200
to run globally.

561
00:28:04,200 --> 00:28:07,700
And I think that has knock-on effects
for things like your downtime

562
00:28:07,780 --> 00:28:10,640
for doing certain types of major
upgrades and things.

563
00:28:10,640 --> 00:28:11,320
So I

564
00:28:11,320 --> 00:28:11,690
Nikolay: can see...

565
00:28:11,690 --> 00:28:12,057
Major upgrades

566
00:28:12,057 --> 00:28:12,560
Michael: should be

567
00:28:12,560 --> 00:28:13,400
Nikolay: zero downtime.

568
00:28:14,340 --> 00:28:18,280
And if you've jumped from 100 to
1,000, yes, it will probably

569
00:28:18,720 --> 00:28:23,760
increase analyze time maybe two or
three times, but not 10 times.

570
00:28:24,720 --> 00:28:25,740
It's not linear.

571
00:28:27,720 --> 00:28:31,420
Michael: Sure, but I haven't
seen it cause huge problems

572
00:28:31,440 --> 00:28:31,940
globally.

573
00:28:32,440 --> 00:28:38,160
But yeah, if you've got a column
that's like not a distribution,

574
00:28:38,520 --> 00:28:41,240
well in a skewed distribution,
increasing it for that column

575
00:28:41,240 --> 00:28:42,040
can really help.

576
00:28:42,040 --> 00:28:45,460
And then the final one I've got on
my list was the multi-columns.

577
00:28:45,720 --> 00:28:46,220
Yeah.

578
00:28:47,840 --> 00:28:53,440
Nikolay: Which can be only four columns
inside one table yeah it cannot

579
00:28:53,440 --> 00:28:58,520
be unfortunately for two tables, two
different columns in two tables.

580
00:28:58,520 --> 00:29:01,560
It would be interesting to have
maybe as well. Sometimes one table

581
00:29:01,560 --> 00:29:03,740
fully depends on another, right?

582
00:29:04,740 --> 00:29:05,140
Michael: Yeah, I

583
00:29:05,140 --> 00:29:05,820
Nikolay: don't know.

584
00:29:06,180 --> 00:29:06,360
We

585
00:29:06,360 --> 00:29:08,720
Michael: do have a whole episode
on hints, but I think this is

586
00:29:08,720 --> 00:29:11,040
where the hints discussion really
comes into play.

587
00:29:11,040 --> 00:29:16,960
It's like, what are your options
when we're out of tools within

588
00:29:16,960 --> 00:29:21,260
Postgres to give the planner as
much information as we can about

589
00:29:21,260 --> 00:29:23,740
the distribution and the stats.

590
00:29:24,340 --> 00:29:24,840
Nikolay: Right.

591
00:29:25,080 --> 00:29:26,015
Bugs also happen.

592
00:29:26,015 --> 00:29:27,262
So bugs might happen.

593
00:29:27,262 --> 00:29:32,020
I mean, planner bugs or some not
developed things, not yet developed

594
00:29:32,020 --> 00:29:35,440
things, some easy stuff which is
not yet there, Postgres planner

595
00:29:35,440 --> 00:29:36,400
is not super powerful.

596
00:29:36,400 --> 00:29:39,620
It's quite powerful, but compared
to SQL server, for example,

597
00:29:39,960 --> 00:29:45,160
the code base is much smaller and
the amount of, the number of

598
00:29:45,160 --> 00:29:49,360
engineering hours invested into SQL
Server Planner are much bigger

599
00:29:49,360 --> 00:29:50,700
than into Postgres.

600
00:29:50,800 --> 00:29:54,640
So of course it's evolving, but
still some things might be not

601
00:29:54,640 --> 00:29:57,900
yet developed and sometimes people
see quite simple things.

602
00:29:58,660 --> 00:30:05,520
To me, sometimes you think, oh,
it's so obvious why it's not

603
00:30:05,520 --> 00:30:06,020
here.

604
00:30:07,200 --> 00:30:10,960
And as an example, it's very unrelated,
but it still poses.

605
00:30:11,200 --> 00:30:15,580
Last week, statement_timeout
was committed by Alexander Korotkov.

606
00:30:16,060 --> 00:30:17,280
Many thanks to him.

607
00:30:18,300 --> 00:30:20,260
And statement_timeout was my
idea.

608
00:30:20,280 --> 00:30:24,620
And when I came to this
idea, I was thinking, am I stupid?

609
00:30:24,620 --> 00:30:26,480
Like, am I missing something?

610
00:30:26,480 --> 00:30:29,340
Like whole world still didn't raise
this.

611
00:30:29,340 --> 00:30:31,120
I searched, I don't see discussion.

612
00:30:31,560 --> 00:30:35,040
Why statement_timeout is not
present in PostgreSQL still, like

613
00:30:35,140 --> 00:30:37,540
so many dozens of years of development.

614
00:30:38,500 --> 00:30:42,520
And then a few folks validated
it's a good idea actually, and

615
00:30:42,520 --> 00:30:44,280
then it got support.

616
00:30:44,380 --> 00:30:48,600
So if you spend time with Postgres
enough, with the Query Planner, you

617
00:30:48,600 --> 00:30:53,080
might indeed see cases which are
not yet well-developed.

618
00:30:54,000 --> 00:30:55,520
Or just simply bugs.

619
00:30:55,520 --> 00:30:57,640
Sometimes just bugs happen as well.

620
00:30:57,840 --> 00:30:58,340
Michael: Yeah.

621
00:30:58,440 --> 00:31:02,320
I've spoken to several customers
that have long histories with

622
00:31:02,540 --> 00:31:08,100
DB2, Oracle, and they're surprised
by some things, but they also

623
00:31:08,100 --> 00:31:11,820
talk about some issues with, like,
especially on the Oracle side,

624
00:31:11,920 --> 00:31:15,660
about kind of patchwork solutions.

625
00:31:16,080 --> 00:31:19,340
So sometimes more code in these
optimizers is not necessarily

626
00:31:19,400 --> 00:31:23,100
better because it's like patched
fix on patch fix on patch fix

627
00:31:23,100 --> 00:31:26,920
and it leads to quite like nasty
weird behaviors.

628
00:31:27,540 --> 00:31:31,500
So I do admire the elegance of
the simplicity, but equally, I

629
00:31:31,500 --> 00:31:34,040
think you're right people I know
that that know these things

630
00:31:34,040 --> 00:31:38,600
well, or have used SQL Server or
DB2 in the past, definitely

631
00:31:38,600 --> 00:31:40,300
say nice things about those planners.

632
00:31:41,040 --> 00:31:44,240
Nikolay: Well, my opinion is also
based not on my experience.

633
00:31:44,340 --> 00:31:47,800
My last experience outside Postgres
was very long ago.

634
00:31:47,800 --> 00:31:51,020
I mean, real experience when you
do develop something for long.

635
00:31:51,020 --> 00:31:52,580
It was more than 15 years ago.

636
00:31:52,580 --> 00:31:57,840
So I'm just translating other folks'
opinions, but they seem

637
00:31:57,980 --> 00:31:59,280
reasonable, these opinions.

638
00:31:59,340 --> 00:32:03,260
And they are experienced guys who
actually came to Postgres and

639
00:32:03,260 --> 00:32:07,500
say, you know, like, let's just
admit the planner can be much,

640
00:32:07,500 --> 00:32:08,240
much better.

641
00:32:08,240 --> 00:32:11,140
We need to continue developing
things, improving things.

642
00:32:11,520 --> 00:32:15,660
And in this case, let's advertise
a recent blog post from David

643
00:32:15,660 --> 00:32:16,160
Rowley.

644
00:32:17,380 --> 00:32:24,400
So I've noticed many items, it
was his contribution.

645
00:32:25,520 --> 00:32:27,880
Michael: Yeah, but also many from
other people too.

646
00:32:28,080 --> 00:32:32,620
Nikolay: Yeah, so like the post
is what's new in the planner

647
00:32:33,220 --> 00:32:34,260
in PostgreSQL 16?

648
00:32:35,340 --> 00:32:37,440
Let's unfold it and so on.

649
00:32:37,440 --> 00:32:38,200
It's great.

650
00:32:39,340 --> 00:32:42,540
It's also in my to-do list to inspect
deeper and maybe do some

651
00:32:42,540 --> 00:32:43,040
tests.

652
00:32:43,040 --> 00:32:46,060
So it's a good post.

653
00:32:46,320 --> 00:32:49,840
Michael: And the thing I love most
about it was inspiring people

654
00:32:49,840 --> 00:32:50,520
to upgrade.

655
00:32:50,740 --> 00:32:53,040
**Postgres** 16 came out a few months
ago now.

656
00:32:53,040 --> 00:32:56,700
We've had some minor, at least
1, I think 2 now minor patches.

657
00:32:57,560 --> 00:32:58,880
Nikolay: February, I have no idea.

658
00:32:59,340 --> 00:33:03,720
Michael: I already know at least
1 company that has upgraded

659
00:33:03,740 --> 00:33:06,820
as a direct result of that blog
post coming out, just to see

660
00:33:06,820 --> 00:33:08,900
if some of their queries improved
as a result.

661
00:33:09,020 --> 00:33:09,520
Cool.

662
00:33:09,640 --> 00:33:10,140
Nikolay: Cool.

663
00:33:10,600 --> 00:33:11,100
Michael: Good.

664
00:33:11,120 --> 00:33:12,760
So yeah, thank you to David.

665
00:33:12,780 --> 00:33:15,400
Nikolay: This is important to share,
by the way, because other

666
00:33:15,400 --> 00:33:19,040
people think like, maybe it's still
too early and so on.

667
00:33:19,140 --> 00:33:22,840
And you know, like I recently was
approached and just the question,

668
00:33:22,840 --> 00:33:25,180
do you know anything bad about
**Postgres** 16?

669
00:33:26,140 --> 00:33:29,280
I said no, actually, it's not,
it's not **Postgres** 9.3.

670
00:33:29,540 --> 00:33:31,600
No, 9.3 was terrible.

671
00:33:31,620 --> 00:33:32,240
I remember.

672
00:33:32,760 --> 00:33:35,040
Michael: I think I've got a new
answer for you that I think is

673
00:33:35,040 --> 00:33:35,940
more on brand.

674
00:33:36,220 --> 00:33:39,920
I think you should say yes, but
they're also true in 15, 14,

675
00:33:39,920 --> 00:33:41,700
13, 12, 11 and 10.

676
00:33:42,440 --> 00:33:42,940
Nikolay: Okay.

677
00:33:44,220 --> 00:33:44,720
Yeah.

678
00:33:45,060 --> 00:33:46,340
Michael: Anyway, I know what you
mean.

679
00:33:46,400 --> 00:33:50,220
Did you mean like you don't have
anything bad in 16 that's new

680
00:33:50,220 --> 00:33:50,900
in 16?

681
00:33:51,060 --> 00:33:54,440
Nikolay: Nobody bumped into some
problems like complaining that

682
00:33:54,440 --> 00:33:55,120
it's bad.

683
00:33:55,120 --> 00:33:59,280
There are a few companies upgraded,
I mean, maybe not huge installations,

684
00:33:59,440 --> 00:34:00,260
but still.

685
00:34:01,720 --> 00:34:04,740
It has a good reputation so far.

686
00:34:05,640 --> 00:34:09,640
And of course, the risks are still
there, and tomorrow we might

687
00:34:09,640 --> 00:34:12,940
see some problems, but these risks
already go down, down, and

688
00:34:12,940 --> 00:34:14,240
down over time, right?

689
00:34:14,540 --> 00:34:17,680
And I mean, it's important to share,
so we upgraded, everything

690
00:34:17,680 --> 00:34:18,980
is good, in our case.

691
00:34:19,020 --> 00:34:22,540
It doesn't mean in any case it
will be good, but it builds up

692
00:34:22,540 --> 00:34:27,220
the level of confidence over time,
and more people will be upgrading.

693
00:34:29,480 --> 00:34:30,260
Nice one.

694
00:34:31,160 --> 00:34:32,220
Okay, good.

695
00:34:32,220 --> 00:34:32,320
Did

696
00:34:32,320 --> 00:34:33,980
Michael: you have anything else
you wanted to add?

697
00:34:34,280 --> 00:34:36,820
Nikolay: I always have a lot of
things, but I think we're out

698
00:34:36,820 --> 00:34:37,520
of time.

699
00:34:37,700 --> 00:34:39,640
Let's not consume a lot of time.

700
00:34:40,520 --> 00:34:45,280
Of course, it's good if you run
more or walk your dog more, listening

701
00:34:45,280 --> 00:34:48,740
to us longer, but still, there
are other things to do.

702
00:34:48,740 --> 00:34:52,040
So thank you so much and thank
you for support.

703
00:34:52,060 --> 00:34:56,140
We still like, I didn't follow
really well, but we see comments

704
00:34:56,140 --> 00:34:57,940
on YouTube, for example, it's good.

705
00:34:57,980 --> 00:35:02,200
Please share your experience as
well and other people also see

706
00:35:02,200 --> 00:35:05,860
it and we can highlight some comments
not only about Red Bull,

707
00:35:05,860 --> 00:35:06,360
right?

708
00:35:07,480 --> 00:35:08,200
Or beer.

709
00:35:09,520 --> 00:35:10,280
Michael: Free advert.

710
00:35:10,520 --> 00:35:11,260
I'm teasing.

711
00:35:12,380 --> 00:35:15,520
Yeah, I think over a thousand people
have listened to the last

712
00:35:15,520 --> 00:35:17,220
few episodes, which is pretty cool.

713
00:35:17,560 --> 00:35:17,900
Yeah.

714
00:35:17,900 --> 00:35:19,020
So thank you all.

715
00:35:19,400 --> 00:35:19,900
Nikolay: Good.

716
00:35:20,640 --> 00:35:21,140
Okay.

717
00:35:23,220 --> 00:35:23,820
Bye bye.