1
00:00:00,060 --> 00:00:01,980
Nikolay: Hello, hello, this is
Postgres.FM.

2
00:00:01,980 --> 00:00:06,340
My name is Nikolay, founder of
Postgres.AI and as usual my co-host

3
00:00:06,340 --> 00:00:07,560
is Michael, pgMustard.

4
00:00:07,840 --> 00:00:08,580
Hi Michael.

5
00:00:09,140 --> 00:00:10,020
Michael: Hello Nikolay.

6
00:00:10,440 --> 00:00:11,420
Nikolay: How are you doing?

7
00:00:12,340 --> 00:00:13,720
How is your service doing?

8
00:00:14,640 --> 00:00:16,320
I saw a newsletter.

9
00:00:17,140 --> 00:00:19,220
Michael: Oh yes, all good.

10
00:00:19,220 --> 00:00:20,100
Ticking along.

11
00:00:20,500 --> 00:00:21,000
Nikolay: Improvements.

12
00:00:21,220 --> 00:00:22,060
Michael: How about you?

13
00:00:22,260 --> 00:00:25,060
Yeah, also like bug fixes, you
know.

14
00:00:25,520 --> 00:00:26,020
Michael: Yeah.

15
00:00:26,200 --> 00:00:27,680
Nikolay: And improvements as well.

16
00:00:27,720 --> 00:00:31,140
We are preparing some big update
probably next month.

17
00:00:31,720 --> 00:00:35,600
So I wanted to, actually you wanted
to discuss this.

18
00:00:35,600 --> 00:00:36,600
It's your topic.

19
00:00:37,080 --> 00:00:39,600
Michael: Yeah, very much my topic,
it sounds like.

20
00:00:39,900 --> 00:00:44,100
Nikolay: Yeah, and I think it's
a useful topic, but I also want

21
00:00:44,100 --> 00:00:50,320
to be honest and, as usual, I want
to be honest and discuss how

22
00:00:50,320 --> 00:00:57,040
often can we use that feature of
Postgres and what's wrong.

23
00:00:57,340 --> 00:01:01,580
Not wrong, maybe, but let's just
go like extended statistics

24
00:01:01,720 --> 00:01:02,580
is the topic.

25
00:01:03,240 --> 00:01:07,620
So it's what you can achieve running
SQL command, CREATE STATISTICS,

26
00:01:08,180 --> 00:01:09,140
giving it a name.

27
00:01:09,140 --> 00:01:11,560
It's very similar to index creation,
right?

28
00:01:13,360 --> 00:01:14,140
Similar, right?

29
00:01:14,140 --> 00:01:15,180
You create something

30
00:01:15,240 --> 00:01:16,180
Michael: with, it's

31
00:01:16,180 --> 00:01:18,380
Nikolay: named for 1 table only.

32
00:01:18,380 --> 00:01:20,520
You cannot do it for multiple tables.

33
00:01:21,040 --> 00:01:25,240
You choose columns, and there are
several parameters you can

34
00:01:25,760 --> 00:01:26,260
specify.

35
00:01:28,040 --> 00:01:31,240
And it's very similar to index
creation, but instead of building

36
00:01:31,240 --> 00:01:36,800
the whole structure additionally
to speed up search of values,

37
00:01:37,800 --> 00:01:43,520
of entries, we just tell Postgres
that it needs to maintain more

38
00:01:43,520 --> 00:01:46,740
statistics than usual, because
by default it maintains statistics

39
00:01:46,760 --> 00:01:48,300
for every Column individually.

40
00:01:49,280 --> 00:01:49,780
Right?

41
00:01:50,500 --> 00:01:55,780
And also, not only every Column
individually, but also multi-column

42
00:01:55,940 --> 00:01:59,560
statistics, it does maintain and
also expression, not multi-column,

43
00:01:59,640 --> 00:02:00,560
expression statistics.

44
00:02:00,820 --> 00:02:04,580
It maintains for expression indexes,
like Function indexes.

45
00:02:04,920 --> 00:02:10,020
For example, a very simple common
case is when we have a Column

46
00:02:10,340 --> 00:02:13,700
named email and we use text, not
citext.

47
00:02:13,700 --> 00:02:15,040
Nobody uses citext.

48
00:02:15,040 --> 00:02:16,560
This Extension is not used at all.

49
00:02:16,560 --> 00:02:17,900
I don't know why it exists.

50
00:02:18,240 --> 00:02:19,760
It exists for ages.

51
00:02:19,760 --> 00:02:22,060
citext is an Extension.

52
00:02:22,120 --> 00:02:25,260
It's actually a contrib module, so it's always available in any

53
00:02:25,260 --> 00:02:26,220
Postgres setup.

54
00:02:26,320 --> 00:02:28,220
And it's case-insensitive text.

55
00:02:28,780 --> 00:02:32,860
Because by default, If you create an Index on the Column named

56
00:02:35,460 --> 00:02:42,180
unique Index, it won't guarantee uniqueness in case-insensitive

57
00:02:42,980 --> 00:02:43,480
manner.

58
00:02:44,680 --> 00:02:48,300
So uppercase email, lowercase email, they will be considered

59
00:02:48,340 --> 00:02:49,260
different values.

60
00:02:49,740 --> 00:02:55,060
This is not what we want, because in reality emails, they are

61
00:02:55,060 --> 00:02:55,940
cancel sensitive.

62
00:02:56,820 --> 00:02:57,940
Case doesn't matter.

63
00:02:58,580 --> 00:03:02,980
So you need to use citext contrib module, but nobody does it.

64
00:03:03,080 --> 00:03:04,880
I never see it in reality.

65
00:03:04,920 --> 00:03:06,100
Well, it's so exotic.

66
00:03:06,500 --> 00:03:08,740
I used it last time 10 years ago.

67
00:03:08,940 --> 00:03:14,560
So we usually end up having an Index lower email.

68
00:03:15,260 --> 00:03:19,240
And what I wanted to say is that by default implicitly Postgres

69
00:03:19,240 --> 00:03:22,940
will maintain statistics for this value, lower email.

70
00:03:23,460 --> 00:03:23,960
Right?

71
00:03:24,060 --> 00:03:24,400
Yeah.

72
00:03:24,400 --> 00:03:25,140
So expression.

73
00:03:25,680 --> 00:03:26,400
It's automatically.

74
00:03:27,160 --> 00:03:31,960
But CREATE STATISTICS allows you to maintain additional statistics

75
00:03:31,960 --> 00:03:38,040
like additional expressions or multi-column statistics without

76
00:03:38,040 --> 00:03:39,060
creation of Index?

77
00:03:40,200 --> 00:03:45,200
Michael: Yes, although I've only ever thought until reading a

78
00:03:45,200 --> 00:03:49,460
lot about it earlier today I'd only ever really thought of the

79
00:03:50,080 --> 00:03:51,960
multi-column use cases.

80
00:03:52,540 --> 00:03:58,080
The individual column cases is interesting if you don't if you

81
00:03:58,080 --> 00:04:02,620
if you want to be able to use the statistics on an expression

82
00:04:02,620 --> 00:04:05,920
but don't want the overhead of maintaining an Index on it.

83
00:04:05,920 --> 00:04:06,420
Nikolay: Which

84
00:04:06,760 --> 00:04:09,680
Michael: is interesting but I've not seen that case before.

85
00:04:09,960 --> 00:04:14,060
Nikolay: So for example we have some expression on a Column,

86
00:04:14,060 --> 00:04:17,660
a single Column, we want statistics but we don't have it, we

87
00:04:17,660 --> 00:04:18,900
don't want an Index.

88
00:04:19,940 --> 00:04:21,540
Can you imagine such case?

89
00:04:21,980 --> 00:04:25,240
Michael: I haven't worked out 1 yet, but but it made it into

90
00:04:25,240 --> 00:04:25,860
the documentation.

91
00:04:25,960 --> 00:04:28,780
So I imagine somebody came came with that case 1

92
00:04:28,780 --> 00:04:31,800
Nikolay: example Yeah, I want an example

93
00:04:32,540 --> 00:04:35,220
Michael: But I think I do think it's worth us going back to basics

94
00:04:35,220 --> 00:04:38,720
a little bit, like why do we want or need these statistics?

95
00:04:38,720 --> 00:04:39,900
I think it's really important.

96
00:04:40,080 --> 00:04:41,880
And I was going to just say...

97
00:04:42,340 --> 00:04:47,960
Nikolay: Actually this is what you should see working with individual

98
00:04:47,960 --> 00:04:53,620
query plans in pgMustard, you obviously have cases and as we

99
00:04:53,620 --> 00:04:57,980
discussed before recording, you mentioned that you have some

100
00:04:58,620 --> 00:05:06,100
logic in pgMustard developed to diagnose cases when something

101
00:05:06,100 --> 00:05:11,300
is off and probably you need to
consider statistic creation.

102
00:05:11,980 --> 00:05:13,920
Michael: Well yeah, this is the
tricky thing.

103
00:05:13,940 --> 00:05:15,880
It's not super intelligent though.

104
00:05:16,340 --> 00:05:19,860
We look out for bad raw estimates
and that they can be a real

105
00:05:20,020 --> 00:05:25,880
problem for query plans because
the beauty of the Postgres optimizer

106
00:05:25,920 --> 00:05:29,860
is that it's choosing between various
plans and it's doing that

107
00:05:29,860 --> 00:05:30,480
work for you.

108
00:05:30,480 --> 00:05:34,360
You don't have to specify how it
should get the data, you just

109
00:05:34,360 --> 00:05:37,600
ask your query and it works out,
hopefully the fastest way to

110
00:05:37,600 --> 00:05:41,920
do that, but it's making a bunch
of decisions about join order,

111
00:05:41,940 --> 00:05:45,900
and maybe that's a bad example
for this topic, but a scan type,

112
00:05:46,020 --> 00:05:52,060
different join algorithms, all
of these decisions are based largely

113
00:05:52,160 --> 00:05:56,980
on the number of rows being returned
at each stage and therefore

114
00:05:56,980 --> 00:06:02,280
the selectivity of various things
and calculating those selectivities

115
00:06:02,520 --> 00:06:06,800
is tricky, like it's a hard problem
and it's the reason why Postgres

116
00:06:06,900 --> 00:06:09,000
maintains statistics in the first
place.

117
00:06:09,440 --> 00:06:11,860
It does that on a per column basis.

118
00:06:11,880 --> 00:06:16,200
The reason it doesn't do it on
a combination of column bases

119
00:06:16,200 --> 00:06:20,320
by default is because just the
number of those explodes so quickly

120
00:06:20,320 --> 00:06:24,660
like as soon as you've got a few
columns in in 1 table the number

121
00:06:24,660 --> 00:06:29,040
of combinations of like is it like
n choose 2 like they just

122
00:06:29,040 --> 00:06:29,540
explodes

123
00:06:30,020 --> 00:06:30,340
Nikolay: right

124
00:06:30,340 --> 00:06:34,380
Michael: and So it would be super
expensive to do it for everything.

125
00:06:35,240 --> 00:06:40,440
But in certain extreme cases, which
are actually quite common

126
00:06:40,440 --> 00:06:44,720
in real world applications, some
columns are heavily correlated

127
00:06:44,800 --> 00:06:48,840
to other columns and popular ones
and important ones often get

128
00:06:48,840 --> 00:06:49,780
queried together.

129
00:06:50,140 --> 00:06:56,300
So an example I see really often
in like software service applications

130
00:06:56,720 --> 00:07:01,060
are things like organization ID
and team ID or something like

131
00:07:01,060 --> 00:07:04,160
that you know a team ID is always
going to be associated with

132
00:07:04,160 --> 00:07:06,440
an organization ID always always
always

133
00:07:06,900 --> 00:07:09,280
Nikolay: or country and language
another example

134
00:07:09,280 --> 00:07:11,760
Michael: country and language is
often is a common example as

135
00:07:11,760 --> 00:07:16,340
like car and manufacturer and things
like that I cut and but

136
00:07:16,340 --> 00:07:20,640
the 1 reason I mentioned org ID
and team ID is because often

137
00:07:20,920 --> 00:07:24,340
the data is then partitioned based
on that or the primary key

138
00:07:24,340 --> 00:07:26,120
is some combination of those things.

139
00:07:26,120 --> 00:07:29,200
So they're really often always
being queried together.

140
00:07:30,060 --> 00:07:33,540
And Progress's default, which is
a really sensible default, is

141
00:07:33,540 --> 00:07:37,100
to assume that different conditions
are independent of 1 another.

142
00:07:37,660 --> 00:07:42,480
And in a where condition, that
means it's really going to underestimate

143
00:07:42,660 --> 00:07:45,060
how many rows are going to come
back, which can lead to bad things

144
00:07:45,060 --> 00:07:46,260
like nested loops being

145
00:07:46,260 --> 00:07:46,760
Nikolay: used.

146
00:07:47,280 --> 00:07:50,060
It just multiplies estimates.

147
00:07:51,160 --> 00:07:55,280
Michael: But it also has the opposite
problem in group by if

148
00:07:55,280 --> 00:07:58,640
you group by multiple things and
they're actually very related

149
00:07:58,660 --> 00:08:02,480
you're going to get a massive and
over estimate instead of under

150
00:08:02,640 --> 00:08:05,500
so it can have both it can have
issues in

151
00:08:05,500 --> 00:08:06,260
Nikolay: both directions.

152
00:08:07,280 --> 00:08:13,040
Michael: Yeah so once you know you're like as the maker of that

153
00:08:13,040 --> 00:08:16,360
software you're probably very understanding of which of your

154
00:08:16,360 --> 00:08:17,780
columns are going to be very...

155
00:08:18,340 --> 00:08:22,060
I think the technical language is functionally dependent on others.

156
00:08:22,900 --> 00:08:23,300
Nikolay: How?

157
00:08:23,300 --> 00:08:27,540
Michael: Well, like orgid and teamid, you know that.

158
00:08:28,660 --> 00:08:30,980
Nikolay: I don't like this answer, You know that.

159
00:08:31,780 --> 00:08:39,520
I want the answer which could be written for humans and maybe

160
00:08:39,520 --> 00:08:40,580
not only humans.

161
00:08:41,480 --> 00:08:47,520
Some answer, like some step-by-step how to recipe algorithm or

162
00:08:47,520 --> 00:08:48,480
something, right?

163
00:08:49,200 --> 00:08:51,720
Look here, check this, and then decide.

164
00:08:52,300 --> 00:08:56,820
This is what I think is missing here because as with indexes,

165
00:08:57,200 --> 00:09:02,860
great statistics is something that is not working by default.

166
00:09:03,740 --> 00:09:05,940
It requires an effort from engineer.

167
00:09:07,360 --> 00:09:12,980
And I think, as I told you before recording, I never used it

168
00:09:12,980 --> 00:09:17,320
for real in production to solve any problems, like 0 cases I

169
00:09:17,320 --> 00:09:17,820
have.

170
00:09:17,980 --> 00:09:22,320
It appeared, it was added to Postgres 10, 7 years ago.

171
00:09:22,340 --> 00:09:26,300
During these 7 years, several times I considered it.

172
00:09:26,640 --> 00:09:30,140
And you know, like when something, like we see, okay, the row

173
00:09:30,140 --> 00:09:34,180
estimate is completely off, like thousand times, million times,

174
00:09:34,300 --> 00:09:35,420
completely off.

175
00:09:35,980 --> 00:09:36,580
What to do?

176
00:09:36,580 --> 00:09:37,440
What to do?

177
00:09:37,660 --> 00:09:41,820
Of course, analyze, analyze, recollect the stats, see if it helps.

178
00:09:41,820 --> 00:09:43,280
If it helps, no, it doesn't help.

179
00:09:43,280 --> 00:09:44,080
What to do?

180
00:09:45,080 --> 00:09:50,140
Then, old school approach, okay, like probably 100 buckets is

181
00:09:50,140 --> 00:09:50,820
not enough.

182
00:09:51,820 --> 00:09:55,120
default_statistics_target is a global setting, but you can control

183
00:09:55,120 --> 00:09:59,320
it on an individual column level, yeah, and you say, okay, I

184
00:09:59,320 --> 00:10:01,200
want 1, 000 buckets here.

185
00:10:02,320 --> 00:10:06,880
And actually, I remember Tomáš Vondra was a guest on Postgres

186
00:10:06,880 --> 00:10:12,860
TV we discussed this and Tomáš's opinion was hundred buckets is

187
00:10:12,860 --> 00:10:18,280
quite a lot so it's not a good idea to always fix it increasing

188
00:10:18,280 --> 00:10:21,520
this number but I see sometimes sometimes

189
00:10:21,540 --> 00:10:24,400
Michael: yeah I think that might also sometimes help because

190
00:10:24,400 --> 00:10:27,780
it not only increases the number of buckets but also increases

191
00:10:27,940 --> 00:10:31,500
the length of the MCV the most common values

192
00:10:31,940 --> 00:10:35,680
Nikolay: but if you don't use if we don't use standard statistics

193
00:10:36,500 --> 00:10:39,620
Michael: with regular yeah with regular statistics target on

194
00:10:39,620 --> 00:10:43,640
a column you also get a bump in the number of MCVs tracked.

195
00:10:43,980 --> 00:10:46,100
Nikolay: Ah, okay, this I understand, yeah, yeah, yeah.

196
00:10:46,720 --> 00:10:50,580
Michael: Which is, I think, might also be sometimes helpful for...

197
00:10:50,740 --> 00:10:51,980
It might sometimes solve the problem.

198
00:10:51,980 --> 00:10:56,400
So I can understand why throwing more sampling at the problem

199
00:10:56,740 --> 00:10:58,080
could be helpful there.

200
00:10:58,080 --> 00:11:02,520
Nikolay: And I'm okay to spend more disk space and memory for

201
00:11:02,520 --> 00:11:03,020
this.

202
00:11:03,940 --> 00:11:08,040
The problem usually if you raise
default_statistics_target to

203
00:11:08,040 --> 00:11:11,080
1000, for example, is that analysis
takes longer.

204
00:11:12,840 --> 00:11:15,520
Which is acceptable in many cases.

205
00:11:15,600 --> 00:11:21,440
I wanted to mention very big news
which occurred last week, like

206
00:11:21,440 --> 00:11:23,500
you know it, you name it.

207
00:11:24,520 --> 00:11:30,580
Michael: Well, the commit Postgres
18 that unfortunately won't

208
00:11:30,580 --> 00:11:33,340
include what we're talking about
today, extended statistics,

209
00:11:33,420 --> 00:11:36,040
but it's still huge and really
positive news.

210
00:11:36,280 --> 00:11:39,140
Although I've seen that there are
some performance issues.

211
00:11:39,140 --> 00:11:44,160
Nikolay: So in Postgres 18 it is
expected that during the upgrade

212
00:11:44,720 --> 00:11:45,220
Michael: yes

213
00:11:45,420 --> 00:11:51,480
Nikolay: We all know that analyze
is on user shoulders, basically,

214
00:11:51,700 --> 00:11:53,980
on DBA or DBRE shoulders.

215
00:11:54,280 --> 00:11:56,540
pg_upgrade doesn't run analyze.

216
00:11:57,720 --> 00:12:03,580
If it's in place, brief upgrade,
very fast upgrade, taking only

217
00:12:03,580 --> 00:12:04,460
a couple of minutes.

218
00:12:04,460 --> 00:12:09,060
It's pg_upgrade --link or -k.

219
00:12:09,340 --> 00:12:13,380
So hard links are used, it's very
fast, and it also dumps, restores

220
00:12:13,440 --> 00:12:19,540
the schema, because new version
has different features, so this

221
00:12:19,540 --> 00:12:22,260
schema needs to be dumped explicitly
and restored.

222
00:12:22,660 --> 00:12:23,600
This is automated.

223
00:12:23,940 --> 00:12:27,940
But then it says, okay, now it's
time to run analyze, and there

224
00:12:27,940 --> 00:12:30,540
is analyzer in stages, which I
don't like.

225
00:12:30,860 --> 00:12:32,540
We talked about it a few times.

226
00:12:32,980 --> 00:12:36,220
But it's basically, it's outside
of the scope of pg_upgrade

227
00:12:36,220 --> 00:12:37,160
to run analyze.

228
00:12:37,580 --> 00:12:42,900
We also, I criticized managed Postgres
providers like RDS, Cloud

229
00:12:42,900 --> 00:12:49,080
SQL, Azure because they don't automate
it as well and live on

230
00:12:49,080 --> 00:12:50,200
shoulders of DBR.

231
00:12:50,740 --> 00:12:54,120
And people keep forgetting it,
and after upgrade, they don't

232
00:12:54,120 --> 00:12:55,940
have statistics at all.

233
00:12:56,040 --> 00:13:01,860
So, big news is in Postgres 18,
it was just committed and hopefully

234
00:13:01,940 --> 00:13:02,880
won't be reverted.

235
00:13:04,120 --> 00:13:06,100
Statistics will be dumped and restored.

236
00:13:06,780 --> 00:13:10,080
Michael: Yes, and that's even better
than, in my opinion, that's

237
00:13:10,080 --> 00:13:12,600
even better than analyze being
automated.

238
00:13:13,040 --> 00:13:14,400
Nikolay: Of course, it's better.

239
00:13:14,440 --> 00:13:18,280
This is the answer I've got when
I raised this on Twitter on

240
00:13:18,280 --> 00:13:24,120
X many months ago saying that's
not all right that managed service

241
00:13:24,120 --> 00:13:25,440
providers don't do it.

242
00:13:25,440 --> 00:13:30,820
Somebody said like let's wait and
it was the right idea So if

243
00:13:30,820 --> 00:13:35,280
now statistics is dumped and restored,
we have old statistics

244
00:13:35,800 --> 00:13:36,740
and it's good.

245
00:13:37,440 --> 00:13:42,540
Michael: Yeah, and my understanding
is this is a pg_dump feature,

246
00:13:42,620 --> 00:13:45,540
therefore it should work...

247
00:13:46,020 --> 00:13:51,600
Let's say it does end up in 18,
it should work for the next upgrade

248
00:13:51,680 --> 00:13:54,100
you do, no matter what version
you're coming from.

249
00:13:54,240 --> 00:13:55,640
Nikolay: Yeah, I also thought about
it.

250
00:13:55,640 --> 00:13:56,300
It's interesting.

251
00:13:56,940 --> 00:14:02,200
I'm not 100% sure, but of course,
definitely you can run, you

252
00:14:02,200 --> 00:14:10,820
can use new pg_dump against some
old Postgres server, and hopefully

253
00:14:10,840 --> 00:14:11,820
this feature will work.

254
00:14:11,820 --> 00:14:16,620
It means that we will be able to
upgrade old servers to new without

255
00:14:16,620 --> 00:14:19,020
need to run ANALYZE explicitly.

256
00:14:19,900 --> 00:14:21,420
Yeah, and

257
00:14:21,420 --> 00:14:22,540
Michael: it means less downtime.

258
00:14:23,680 --> 00:14:26,640
I consider that time you're running
ANALYZE to still be there,

259
00:14:26,640 --> 00:14:30,180
could effectively be down, even
if you try to restore connections,

260
00:14:30,360 --> 00:14:33,520
just because queries could be much,
much slower than they should

261
00:14:33,520 --> 00:14:33,740
be.

262
00:14:33,740 --> 00:14:39,520
So, yeah, I think for me this could
drastically reduce effective

263
00:14:39,680 --> 00:14:42,380
downtime for a lot of projects
during upgrades.

264
00:14:42,520 --> 00:14:48,100
Nikolay: Yeah, I agree, because
in my opinion, ANALYZE in stages

265
00:14:49,620 --> 00:14:51,660
it's a wrong feature, completely
wrong.

266
00:14:51,660 --> 00:14:56,320
And I prefer keeping an ANALYZE
inside maintenance window.

267
00:14:56,320 --> 00:14:59,840
And of course, if you have defaults,
this is a target elevated

268
00:15:00,480 --> 00:15:02,720
to some high numbers, thousands.

269
00:15:03,280 --> 00:15:07,540
In this case, it takes longer,
especially if you have like some

270
00:15:07,540 --> 00:15:11,540
people have a lot of Database objects,
a lot.

271
00:15:12,040 --> 00:15:16,680
I just recently had the pleasure
to deal a little bit with a

272
00:15:16,680 --> 00:15:21,660
case with 200,000 tables and more
than 1 million indexes and

273
00:15:21,660 --> 00:15:25,420
if you decided to raise
default_statistics_target there, ANALYZE

274
00:15:25,440 --> 00:15:31,740
will take ages and even if it's
parallelized like yeah so keeping

275
00:15:31,740 --> 00:15:32,880
statistics is good.

276
00:15:33,300 --> 00:15:36,560
Michael: Let's back to the topic
though, you might still want

277
00:15:36,560 --> 00:15:38,560
to run ANALYZE after a major version
upgrade.

278
00:15:38,560 --> 00:15:42,480
There are a couple of caveats,
even in the 18 commit, as to things

279
00:15:42,480 --> 00:15:44,060
that won't get pulled across.

280
00:15:44,060 --> 00:15:45,940
And one of them is extended statistics.

281
00:15:45,940 --> 00:15:49,540
So if you are using extended statistics,
you will, and they're

282
00:15:49,540 --> 00:15:52,480
important for your query performance,
you will still need to

283
00:15:52,480 --> 00:15:56,940
run ANALYZE to actually get back
up and running.

284
00:15:56,940 --> 00:15:58,780
Nikolay: I always like running
ANALYZE.

285
00:15:58,780 --> 00:16:01,400
For example, you know, if you have
partitioned table, the vacuum

286
00:16:01,400 --> 00:16:04,580
doesn't maintain statistics on
the parent table.

287
00:16:04,680 --> 00:16:09,180
Somehow if you run ANALYZE explicitly
you have your statistics

288
00:16:09,400 --> 00:16:12,480
so I just like running ANALYZE.

289
00:16:13,140 --> 00:16:16,440
Michael: I think there's even an
improvement, maybe it was in

290
00:16:16,440 --> 00:16:19,840
17 that you can run ANALYZE only
on the parents so you don't even

291
00:16:19,840 --> 00:16:22,820
have to analyze all of the partitions,
which is quite cool.

292
00:16:23,200 --> 00:16:26,380
Nikolay: Yeah, so partition statistics
on partitioned tables probably

293
00:16:26,380 --> 00:16:31,600
it's another topic and it's yes
itself but back to extended statistics.

294
00:16:31,620 --> 00:16:36,700
So my point is since this is a
feature which requires you to

295
00:16:36,700 --> 00:16:41,020
run some things explicitly and
make decisions, there should be

296
00:16:41,020 --> 00:16:41,520
recipes.

297
00:16:41,680 --> 00:16:43,200
For indexes, we have a recipe.

298
00:16:43,380 --> 00:16:47,540
We have sequential scan, we see
some filter or order by, okay,

299
00:16:47,540 --> 00:16:49,400
we know this index should help.

300
00:16:49,400 --> 00:16:52,860
We verify it, ideally on full-size
clones.

301
00:16:53,360 --> 00:16:55,820
We see it helps, we deploy it to
production.

302
00:16:56,400 --> 00:17:00,160
People who don't have fast and
cheap clones, they check it right

303
00:17:00,160 --> 00:17:01,960
on production and see it works.

304
00:17:01,960 --> 00:17:02,460
Okay.

305
00:17:03,040 --> 00:17:04,060
So that's it.

306
00:17:04,120 --> 00:17:08,480
As for statistics, which extended
statistics, which was added

307
00:17:08,480 --> 00:17:11,420
as we discussed in Postgres 10,
7 years ago.

308
00:17:11,720 --> 00:17:17,360
During these 7 years, several times
I said, okay, Maybe we need

309
00:17:18,420 --> 00:17:19,940
to create statistics here.

310
00:17:19,940 --> 00:17:20,660
Let's try.

311
00:17:21,600 --> 00:17:24,180
I remember 0 cases when this was
solution.

312
00:17:24,860 --> 00:17:29,180
We always ended up having different
solution like creation of

313
00:17:29,180 --> 00:17:34,660
index or maybe redesigning schema
sometimes to the question of

314
00:17:34,660 --> 00:17:37,000
previous discussion with Frank,
right?

315
00:17:37,360 --> 00:17:42,340
I mean, redesigning schema, denormalization
and so on.

316
00:17:42,900 --> 00:17:45,120
So I don't remember conclusion.

317
00:17:45,120 --> 00:17:48,680
Okay, in this case, CREATE STATISTICS
is our solution to the

318
00:17:48,680 --> 00:17:56,540
problem and that's Saves me it
Honestly, it builds some Weird

319
00:17:56,540 --> 00:17:58,680
feeling I'm missing something.

320
00:17:58,780 --> 00:18:00,060
Everyone is using something.

321
00:18:00,060 --> 00:18:00,940
I'm not using

322
00:18:03,000 --> 00:18:05,440
Michael: Well, interestingly the
reason 1 of the main reasons

323
00:18:05,440 --> 00:18:08,440
I wanted to bring this up is I
think extended statistics might

324
00:18:08,440 --> 00:18:11,480
be underused globally I think a
lot of people don't know that

325
00:18:11,480 --> 00:18:11,820
it exists.

326
00:18:11,820 --> 00:18:14,340
Nikolay: Because there are no recipes
that's this is my point

327
00:18:14,340 --> 00:18:18,940
we need recipes when Very concrete
recipes how to decide it's

328
00:18:18,940 --> 00:18:20,460
worth using it right here.

329
00:18:20,740 --> 00:18:21,520
Michael: Yeah, interesting.

330
00:18:22,540 --> 00:18:24,520
But I do have something as a chance.

331
00:18:26,460 --> 00:18:28,460
Nikolay: Give me a schema only
dump.

332
00:18:29,120 --> 00:18:36,260
I will give some LLM, good 1, this
dump and ask to find just

333
00:18:36,260 --> 00:18:41,760
based on column names, to find
some ideas what can have correlation,

334
00:18:41,820 --> 00:18:44,440
then we build statistics and hope
it will help someday.

335
00:18:44,490 --> 00:18:46,140
But it's a weak solution.

336
00:18:46,920 --> 00:18:50,320
Michael: I also think it's flawed,
because I think if it's not

337
00:18:50,320 --> 00:18:53,520
actually causing query performance
issues, then you're paying

338
00:18:53,520 --> 00:18:55,740
that penalty for Analyze being
slower.

339
00:18:55,800 --> 00:18:56,740
Well, again...

340
00:18:56,820 --> 00:18:57,860
For no benefit.

341
00:18:58,080 --> 00:18:59,520
And actually, that's a problem.

342
00:19:00,040 --> 00:19:01,340
Nikolay: I didn't finish my thought.

343
00:19:02,080 --> 00:19:06,540
I'm okay to pay some penalty to
Analyze more, keep more statistics.

344
00:19:06,660 --> 00:19:07,700
It's not a problem.

345
00:19:07,880 --> 00:19:12,040
Unless we have hundreds of thousands
or millions of objects.

346
00:19:12,800 --> 00:19:14,800
In this case, I will be very careful.

347
00:19:16,100 --> 00:19:19,440
But if it's only like a few thousand
of objects I mean indexes

348
00:19:19,440 --> 00:19:25,640
and tables it's it takes 1 minute
or 3 minutes to run Analyze

349
00:19:25,640 --> 00:19:30,640
on this database it can be many
many many terabytes 1 or 3 minutes

350
00:19:30,640 --> 00:19:33,920
I don't care and in terms of storage
it's nothing and memory

351
00:19:33,920 --> 00:19:37,200
it's not a lot maybe I'm wrong
but I think so.

352
00:19:37,660 --> 00:19:39,920
Michael: I think you're right and
I think that's important for

353
00:19:39,920 --> 00:19:42,720
people to realize that it's not
dependent on your data size because

354
00:19:42,720 --> 00:19:45,840
we're sampling at this point like
because it's a sampling thing

355
00:19:46,400 --> 00:19:49,740
it doesn't scale linearly with
how much data you have.

356
00:19:49,740 --> 00:19:53,800
Nikolay: Also important disclaimer,
I have set of mind targeted

357
00:19:54,180 --> 00:19:59,540
startups heavily because I'm building
my fourth startup and our

358
00:19:59,540 --> 00:20:04,560
clients, consulting clients and
clients for products mostly are

359
00:20:04,560 --> 00:20:08,860
startups and that means rapid development,
rapid, really rapid,

360
00:20:08,860 --> 00:20:11,440
sometimes a couple of deployments
per day or more.

361
00:20:11,840 --> 00:20:15,140
It means that maybe we, statistics
we decided to keep, maybe

362
00:20:15,140 --> 00:20:18,940
it's not needed today, but maybe
it will be needed tomorrow because

363
00:20:18,940 --> 00:20:23,440
we make some new features and so
on right and it's maybe it's

364
00:20:23,440 --> 00:20:26,680
better to have it because we know
there is correlation in between

365
00:20:26,680 --> 00:20:29,780
these columns so let's have extended
statistics over all these

366
00:20:29,780 --> 00:20:34,500
columns but we lack recipes this
is the point So like like recipes

367
00:20:34,640 --> 00:20:36,660
leads to under use of this feature.

368
00:20:36,660 --> 00:20:37,160
I

369
00:20:37,800 --> 00:20:40,380
Michael: Think they might also
I think historically you've also

370
00:19:40,380 --> 00:20:44,640
mentioned having a bias towards
OLTP systems in production and

371
00:20:44,640 --> 00:20:50,140
I think there's also a chance that
because often you're hyper-optimizing

372
00:20:50,740 --> 00:20:55,360
queries and that's a good thing
going for index only scans and

373
00:20:55,640 --> 00:20:59,900
you know very very limited data
retrieval very small number of

374
00:20:59,900 --> 00:21:05,100
rows a very very precise lookups
you're probably not hitting

375
00:21:05,500 --> 00:21:09,020
these bad plans in a lot of those
cases that you're skipping

376
00:21:09,020 --> 00:21:10,760
straight to a really good

377
00:21:10,760 --> 00:21:11,580
Nikolay: that's interesting for

378
00:21:11,580 --> 00:21:14,180
Michael: that's really hard for
the planners to get wrong because

379
00:21:14,180 --> 00:21:18,040
you're giving it the ideal index
to use for the important access

380
00:21:18,040 --> 00:21:22,080
patterns and it's possible that
this comes up a little bit more

381
00:21:22,080 --> 00:21:26,640
when people are having to serve
more user-defined queries, maybe

382
00:21:26,640 --> 00:21:30,260
analytical ones that have any combination
of parameters.

383
00:21:30,260 --> 00:21:31,520
Nikolay: Right, like OLAP queries
and so

384
00:21:31,520 --> 00:21:32,020
Michael: on.

385
00:21:32,520 --> 00:21:37,020
Yeah, exactly and I think in those
cases maybe sometimes just

386
00:21:37,020 --> 00:21:39,520
not knowing what's going to come
you can't create indexes for

387
00:21:39,520 --> 00:21:42,040
every case you do and then this
is slow

388
00:21:42,040 --> 00:21:46,000
Nikolay: down writes unlike statistics
so this is a super big

389
00:21:46,000 --> 00:21:46,500
difference.

390
00:21:47,780 --> 00:21:52,280
Michael: Yes so I think there's
a chance that bias towards OLTP

391
00:21:52,420 --> 00:21:57,780
and also being able to often add
the indexes you need for a smaller

392
00:21:57,780 --> 00:22:01,420
number of access patterns or that
like a lot of queries per second

393
00:22:01,720 --> 00:22:04,860
might be that this just isn't as
important in those cases.

394
00:22:06,000 --> 00:22:11,080
But in terms of recipes, I think
the big 1 is bad row estimates

395
00:22:11,120 --> 00:22:13,480
on important and slow queries.

396
00:22:13,700 --> 00:22:16,760
So if a Query is important and
a lot slower than you need it

397
00:22:16,760 --> 00:22:20,080
to be, and there's a really bad
row estimate in an important

398
00:22:20,080 --> 00:22:21,020
part of the plan.

399
00:22:21,020 --> 00:22:23,940
So like, not in a part of the plan
that is fast, but in a part

400
00:22:23,940 --> 00:22:25,300
of the plan that is slow.

401
00:22:25,600 --> 00:22:31,780
The really famous case is a nested
loop that thinks that the

402
00:22:31,780 --> 00:22:36,380
planner estimates it's gonna return
very, very few rows per loop,

403
00:22:36,380 --> 00:22:38,080
so maybe like 0 or 1.

404
00:22:38,260 --> 00:22:41,920
So therefore, it's thinking, well,
I only need to do this a relatively

405
00:22:41,980 --> 00:22:44,960
small number of times, therefore,
it's quickest and easiest to

406
00:22:44,960 --> 00:22:47,460
get started with a nested loop
and I'll be done really quickly.

407
00:22:47,680 --> 00:22:52,400
But in reality we'll create maybe
a thousand rows or more per

408
00:22:52,400 --> 00:22:52,900
iteration.

409
00:22:53,940 --> 00:22:56,760
Actually in those cases a hash
join or merge join would have

410
00:22:56,760 --> 00:22:58,760
been it would so much faster.

411
00:22:59,060 --> 00:23:02,820
So those cases I think can benefit.

412
00:23:03,500 --> 00:23:07,940
But even actually there's a blog
post I was thinking of that

413
00:23:07,940 --> 00:23:10,480
blogged about a time where extended
statistics helped them a

414
00:23:10,480 --> 00:23:14,960
lot I put the plan through our
tool and noticed that yeah a bad

415
00:23:14,960 --> 00:23:19,180
row estimate was as highly scored
as it could be, but so was

416
00:23:19,240 --> 00:23:20,880
an index efficiency issue.

417
00:23:21,040 --> 00:23:24,640
And I think, looking back, they
could have added a better index

418
00:23:24,640 --> 00:23:28,160
and solved their performance issue
just as well.

419
00:23:28,580 --> 00:23:31,560
So they've solved it with great
statistics, but they could have

420
00:23:31,560 --> 00:23:34,120
solved it with better index and
probably the query plan they

421
00:23:34,120 --> 00:23:37,380
got in the end could still be improved
further by a better index.

422
00:23:37,940 --> 00:23:42,340
Nikolay: So I have an idea for
myself and who is listening is

423
00:23:42,340 --> 00:23:43,640
welcome to join.

424
00:23:44,680 --> 00:23:47,700
Next month I will have a challenge
for myself.

425
00:23:48,940 --> 00:23:54,100
So every time I deal with some
query optimization, I will put

426
00:23:54,960 --> 00:23:59,120
consideration of extended statistics
at first places.

427
00:24:00,060 --> 00:24:02,660
I will think, can it be helpful
here?

428
00:24:03,120 --> 00:24:03,940
And so on.

429
00:24:04,120 --> 00:24:09,440
Just prioritize it and think about
it more.

430
00:24:10,320 --> 00:24:15,820
And after 1 month of this, 30 days,
or month, February is shorter

431
00:24:15,820 --> 00:24:16,400
month, right?

432
00:24:16,400 --> 00:24:18,300
So 30 days.

433
00:24:18,840 --> 00:24:25,160
After it, I will make my conclusion,
should I keep this as 1

434
00:24:25,160 --> 00:24:29,160
of primary tools for query optimization
or it's just secondary

435
00:24:29,160 --> 00:24:29,660
tool?

436
00:24:30,800 --> 00:24:31,960
How does it sound?

437
00:24:32,200 --> 00:24:34,020
Michael: I would love to hear that,
and from anybody else that

438
00:24:34,020 --> 00:24:34,640
does try it.

439
00:24:34,640 --> 00:24:40,660
1 thing I'd encourage doing is
also benchmarking the performance

440
00:24:40,680 --> 00:24:42,040
hit on analyze of that tape.

441
00:24:42,040 --> 00:24:45,640
Like if you do add it to a table,
how long does analyze take

442
00:24:45,640 --> 00:24:48,080
before and how long does analyze
take after and I think it would

443
00:24:48,080 --> 00:24:52,960
depend on which there's like a
few types of extended statistics

444
00:24:53,000 --> 00:24:56,680
that you can add and you can add
all of them for

445
00:24:58,700 --> 00:25:00,040
Nikolay: expressions or you mean

446
00:25:00,100 --> 00:25:03,240
Michael: they could it there's
like 3 There's like 3 parameters

447
00:25:03,240 --> 00:25:03,900
you can give it.

448
00:25:03,900 --> 00:25:06,440
So, nDistinct, dependencies, and
mcv.

449
00:25:07,360 --> 00:25:12,260
And I think mcv will be by far
the most expensive in terms of

450
00:25:12,260 --> 00:25:13,160
analyzed performance.

451
00:25:13,740 --> 00:25:16,300
Oh, by the way, another thing that
the docs mentioned can be

452
00:25:16,300 --> 00:25:19,920
penalized is query planning time
and maybe query planning buffers

453
00:25:19,920 --> 00:25:20,580
for that matter.

454
00:25:20,580 --> 00:25:24,440
Nikolay: If you have a lot of extended
statistics created by

455
00:25:24,440 --> 00:25:27,140
us, a planner can take longer?

456
00:25:27,660 --> 00:25:29,280
Michael: Yeah, well it makes sense,
right?

457
00:25:29,340 --> 00:25:31,940
It's considering more information
up front.

458
00:25:31,980 --> 00:25:33,340
Nikolay: Right, makes sense, yeah.

459
00:25:33,340 --> 00:25:35,740
Michael: I don't know how, but
I'd be interested in the benchmarks

460
00:25:35,740 --> 00:25:38,260
of how much longer, how much more,
like.

461
00:25:38,400 --> 00:25:42,180
Nikolay: Yeah, but in this case
2 things.

462
00:25:42,180 --> 00:25:47,900
First is, well, both Benchmarks
sound to me like single session

463
00:25:47,900 --> 00:25:48,400
benchmarks.

464
00:25:48,400 --> 00:25:53,400
So it's not like we run a lot of
like, we can take 1 session

465
00:25:53,400 --> 00:25:57,000
and check just speed maybe multiple
times for good statistics,

466
00:25:57,160 --> 00:25:57,660
right?

467
00:25:59,760 --> 00:26:03,780
So First thing sounds to me like
checking analyze speed.

468
00:26:04,200 --> 00:26:08,380
And I think we need to remember
that if we talk about a single

469
00:26:08,860 --> 00:26:09,780
table, it's OK.

470
00:26:09,780 --> 00:26:10,900
I mean, no problem.

471
00:26:10,900 --> 00:26:15,820
But if we talk about many tables,
it's worth remembering that

472
00:26:15,920 --> 00:26:16,980
it can be parallelized.

473
00:26:17,200 --> 00:26:21,260
And its recommendation is to use
vacuumdb with option hyphen,

474
00:26:21,260 --> 00:26:26,140
hyphen analyze and specifying number
of jobs, number of processes,

475
00:26:27,260 --> 00:26:28,640
backends basically, right?

476
00:26:28,840 --> 00:26:31,420
Which will execute this.

477
00:26:32,260 --> 00:26:35,860
And if we are alone on the server,
we can take, for example,

478
00:26:35,860 --> 00:26:39,560
the number of vCPUs we have on
the virtual machine, for example.

479
00:26:41,100 --> 00:26:42,220
This is 1 thing.

480
00:26:42,440 --> 00:26:44,940
But if it's a single table, it
cannot be parallelized.

481
00:26:45,300 --> 00:26:47,300
It will be a single thread at any
rate.

482
00:26:48,240 --> 00:26:48,740
Michael: No.

483
00:26:49,280 --> 00:26:52,000
And there's actually 1 more thing
that I think worth trying just

484
00:26:52,000 --> 00:26:57,720
before we wrap up is if it is a
huge penalty but looks valuable

485
00:26:57,720 --> 00:26:59,900
to query performance, which I'm
not sure you're going to come

486
00:26:59,900 --> 00:27:00,980
across, but we'll see.

487
00:27:01,620 --> 00:27:05,460
I've wondered about reducing statistics
target for those columns

488
00:27:05,980 --> 00:27:08,920
so you could reduce it and then
add multi-column.

489
00:27:09,280 --> 00:27:13,160
It just is an interesting idea
of getting back some of that analyzed

490
00:27:13,260 --> 00:27:13,760
time.

491
00:27:13,940 --> 00:27:18,080
Nikolay: I know a very good SQL
optimization expert who recommended

492
00:27:18,080 --> 00:27:21,800
reducing default_statistics_target,
and I'm going to communicate

493
00:27:21,860 --> 00:27:23,540
with him very soon.

494
00:27:23,540 --> 00:27:24,720
Yeah, great.

495
00:27:24,720 --> 00:27:31,560
But this battle in my mind was
clear like 10 years ago already.

496
00:27:32,260 --> 00:27:38,040
Like reducing default_statistics_target
or per column parameter

497
00:27:38,760 --> 00:27:43,340
was considered not good idea in
my mind, compared to increasing

498
00:27:43,380 --> 00:27:44,140
it somehow.

499
00:27:44,760 --> 00:27:49,120
I don't know, like this is from
various cases, But I'm going

500
00:27:49,120 --> 00:27:52,480
to use this as well as a part of
my challenge just to think about

501
00:27:52,480 --> 00:27:55,300
this as well, because I will talk
to him soon.

502
00:27:55,440 --> 00:27:55,680
Michael: So.

503
00:27:55,680 --> 00:27:57,180
I've got an idea for you.

504
00:27:57,180 --> 00:27:59,080
Nikolay: Let me comment on the
second.

505
00:28:00,060 --> 00:28:03,300
I already start forgetting what
you said about...

506
00:28:03,740 --> 00:28:04,860
What was the second 1?

507
00:28:04,860 --> 00:28:07,040
First is less benchmark, you said
benchmark.

508
00:28:07,040 --> 00:28:07,620
I don't...

509
00:28:08,260 --> 00:28:09,240
I'm very careful.

510
00:28:09,240 --> 00:28:09,740
Analyze?

511
00:28:10,520 --> 00:28:11,020
Michael: Yeah.

512
00:28:11,580 --> 00:28:14,740
And also query planning time or
query planning buffers.

513
00:28:15,020 --> 00:28:16,320
Nikolay: Query Planning time, this
is good.

514
00:28:16,320 --> 00:28:18,060
Yeah, buffers as well, that's great.

515
00:28:18,760 --> 00:28:22,000
So increasing default_statistics_target
globally definitely should

516
00:28:22,000 --> 00:28:23,420
affect this as well, right?

517
00:28:24,900 --> 00:28:25,400
Globally,

518
00:28:26,500 --> 00:28:27,000
Michael: yeah.

519
00:28:27,940 --> 00:28:31,960
Nikolay: Here, if somebody is going
to do it.

520
00:28:31,960 --> 00:28:37,120
Don't make a mistake my team and
I did a few months ago when

521
00:28:37,120 --> 00:28:38,580
we published an article.

522
00:28:39,380 --> 00:28:43,020
So when you just connect to Postgres
and you do something, planning

523
00:28:43,020 --> 00:28:48,740
time can be much bigger because
of lack of rel cache and it's

524
00:28:48,740 --> 00:28:49,660
loaded, right?

525
00:28:50,500 --> 00:28:52,940
Second time you run this, it's
very different.

526
00:28:53,300 --> 00:28:57,040
So planning time, first planning
time and second planning time,

527
00:28:57,040 --> 00:28:58,120
they are very different.

528
00:28:58,140 --> 00:29:01,520
If you have connection pooler,
most people do.

529
00:29:01,640 --> 00:29:05,340
It means that you don't, like it's
not a good idea to make conclusions

530
00:29:06,020 --> 00:29:09,880
from observations of the first
round Yeah, yeah, so so

531
00:29:09,880 --> 00:29:10,660
Michael: right point.

532
00:29:10,680 --> 00:29:15,900
Nikolay: Yeah, so if even if statistics
Extended statistics affects

533
00:29:15,980 --> 00:29:16,640
planning time.

534
00:29:16,640 --> 00:29:20,740
I think should be so Question is
is it only for the first round

535
00:29:20,740 --> 00:29:22,260
or the second round as well?

536
00:29:22,800 --> 00:29:23,560
Who knows, right?

537
00:29:23,560 --> 00:29:26,780
Because if only first round it
means like rel cache all...

538
00:29:27,160 --> 00:29:29,240
Well, it's not rel cache here,
right?

539
00:29:30,060 --> 00:29:32,300
Well, it's an interesting point,
just to check.

540
00:29:33,160 --> 00:29:36,260
Michael: Yeah, based on the notes
in the documentation, I got

541
00:29:36,260 --> 00:29:37,940
the impression it would be for
every

542
00:29:38,140 --> 00:29:38,740
Nikolay: planet query.

543
00:29:38,740 --> 00:29:40,940
Because it's statistics, it's not
real cache.

544
00:29:41,540 --> 00:29:44,280
So anyway, I would check multiple
times.

545
00:29:44,600 --> 00:29:47,500
Michael: 1 more idea for you, especially
talking to that person

546
00:29:47,500 --> 00:29:50,380
you mentioned I wonder if you could
come up with a recipe for

547
00:29:50,380 --> 00:29:53,160
which columns to reduce it on I
reckon there's a whole bunch

548
00:29:53,160 --> 00:29:58,320
of really boring columns like unique
IDs you know primary key

549
00:29:58,320 --> 00:30:01,840
columns type thing Do we really
need high statistics targets

550
00:30:01,840 --> 00:30:03,760
on those or could we get away with
much much lower?

551
00:30:03,760 --> 00:30:06,600
Do you see what I mean like ones that we know are unique in it

552
00:30:06,760 --> 00:30:09,100
if we're keeping statistical on the most common values?

553
00:30:09,120 --> 00:30:12,160
Nikolay: Well, I agree with you, but what's the point to optimize

554
00:30:12,160 --> 00:30:12,900
in this area?

555
00:30:12,900 --> 00:30:14,980
What what are our savings?

556
00:30:16,100 --> 00:30:18,720
Michael: In my reducing the time analyze takes

557
00:30:21,900 --> 00:30:26,320
Nikolay: As I said, I don't I don't really I Don't really bother

558
00:30:26,320 --> 00:30:27,440
by a longer time.

559
00:30:27,440 --> 00:30:30,920
Okay, it's 3 minutes against 1 So what

560
00:30:31,020 --> 00:30:33,720
Michael: That is quite a big deal, isn't it, in terms of downtime?

561
00:30:34,540 --> 00:30:35,420
Nikolay: I don't know.

562
00:30:35,660 --> 00:30:38,440
We have a recipe for 0 downtime.

563
00:30:39,280 --> 00:30:42,880
We can afford a couple of additional minutes being spent, and

564
00:30:42,880 --> 00:30:48,160
nobody notices it, because It's a target cluster running on logical

565
00:30:48,160 --> 00:30:48,660
replication.

566
00:30:49,660 --> 00:30:52,540
We can do very long analyze there.

567
00:30:52,780 --> 00:30:55,120
So even 1 hour, nobody notices.

568
00:30:55,120 --> 00:30:57,340
It doesn't affect downtime at all.

569
00:31:06,500 --> 00:31:07,480
So I don't know.

570
00:31:07,960 --> 00:31:08,760
I don't know.

571
00:31:08,940 --> 00:31:11,140
Like I can imagine this will save something.

572
00:31:11,740 --> 00:31:15,360
We can find many places like that, right?

573
00:31:15,880 --> 00:31:21,040
For example, in some cases we can start tuning autovacuum at

574
00:31:21,040 --> 00:31:21,880
table level.

575
00:31:22,540 --> 00:31:26,920
And so many times I saw teams, various teams, went this path.

576
00:31:27,280 --> 00:31:31,680
And then we pulled them back, saying, let's stick to defaults.

577
00:31:32,020 --> 00:31:36,760
Because table level settings often cause more harm than good

578
00:31:37,320 --> 00:31:41,780
Because they add complexity, a lot of complexity It's hard to

579
00:31:41,780 --> 00:31:42,280
maintain.

580
00:31:42,400 --> 00:31:45,540
Then you go microservice, you have a lot of clusters and this

581
00:31:45,540 --> 00:31:47,980
table level Well, it's so hard to maintain.

582
00:31:48,280 --> 00:31:52,100
So and you talk about column level settings.

583
00:31:54,060 --> 00:31:57,780
Michael: All I meant is, if you can find a really simple rule,

584
00:31:58,940 --> 00:32:03,420
yeah exactly, if it's a unique column, reduce the statistics

585
00:32:03,420 --> 00:32:04,400
target to 1.

586
00:32:04,400 --> 00:32:08,700
Nikolay: This is good if it goes to hackers and the core itself, and implementation

587
00:32:09,100 --> 00:32:10,840
which doesn't require effort from user.

588
00:32:10,840 --> 00:32:12,320
This I agree with.

589
00:32:12,880 --> 00:32:13,720
But if it's...

590
00:32:14,540 --> 00:32:20,360
Again, My main concern, my main comment about extended statistics

591
00:32:20,380 --> 00:32:25,700
is it requires effort, it's not clear, there are no clear recipes

592
00:32:25,840 --> 00:32:30,520
how to apply it, so it ends up not being actively used unfortunately.

593
00:32:31,130 --> 00:32:34,620
Maybe in the next 30 days we will have some recipes, right?

594
00:32:35,580 --> 00:32:37,120
Michael: Sounds... I'm looking forward to it.

595
00:32:37,120 --> 00:32:37,800
Nikolay: Sounds good.

596
00:32:37,800 --> 00:32:38,160
Okay.

597
00:32:38,160 --> 00:32:39,160
Thank you so much.

598
00:32:39,160 --> 00:32:40,080
Michael: Thanks so much Nikolay.