1
00:00:00,060 --> 00:00:02,580
Michael: Hello and welcome to Postgres.FM, a weekly show about

2
00:00:02,580 --> 00:00:03,460
all things PostgreSQL.

3
00:00:03,620 --> 00:00:06,820
I am Michael, founder of pgMustard, and this is Nik, founder

4
00:00:06,820 --> 00:00:07,860
of Postgres.AI.

5
00:00:08,200 --> 00:00:09,660
Hello Nik, how's it going?

6
00:00:09,680 --> 00:00:11,620
Nikolay: Hello Michael, everything's all right.

7
00:00:11,760 --> 00:00:13,440
Let's discuss logs.

8
00:00:14,060 --> 00:00:17,580
Everyone loves to deal with logs, right?

9
00:00:18,080 --> 00:00:19,100
Michael: What do you mean?

10
00:00:20,020 --> 00:00:24,500
Nikolay: Well, I mean, if you have a huge server, a lot of logs,

11
00:00:27,040 --> 00:00:28,760
so good to deal with them.

12
00:00:29,160 --> 00:00:32,180
Michael: Well, I guess we're blessed in Postgres with a few ways

13
00:00:32,180 --> 00:00:34,280
that we don't always have to check logs, right?

14
00:00:34,280 --> 00:00:37,460
There are some system views that we can look at various information.

15
00:00:38,520 --> 00:00:40,020
Nikolay: For example corruption, right?

16
00:00:40,020 --> 00:00:41,260
There's no such view.

17
00:00:42,440 --> 00:00:44,980
Michael: Yeah, yeah there was a but there's quite a few things

18
00:00:44,980 --> 00:00:46,300
that there's no such view.

19
00:00:46,300 --> 00:00:49,260
Well there's quite a lot of information that's put in the logs

20
00:00:49,440 --> 00:00:51,180
and isn't available elsewhere.

21
00:00:51,280 --> 00:00:52,460
So it is important.

22
00:00:52,940 --> 00:00:56,400
And when I say is put in the logs, actually quite often it isn't

23
00:00:56,400 --> 00:00:58,760
even put in the logs unless you change some settings.

24
00:00:58,860 --> 00:01:02,480
So I thought it was, you had this good idea to discuss all of

25
00:01:02,480 --> 00:01:06,900
the things that most people should change in terms of what to

26
00:01:06,900 --> 00:01:11,040
log so that we can then check the logs for issues and when we're

27
00:01:11,040 --> 00:01:13,580
doing performance analysis and that kind of thing.

28
00:01:14,340 --> 00:01:18,340
Nikolay: Yeah, So I think system views, ideally we should for

29
00:01:18,340 --> 00:01:21,720
everything we should have both system views and detailed logging.

30
00:01:22,660 --> 00:01:23,320
Michael: Oh both?

31
00:01:24,320 --> 00:01:27,700
Nikolay: Yeah, because like think about special statements, we

32
00:01:27,700 --> 00:01:32,220
have normalized queries only, it's not possible to have there

33
00:01:32,800 --> 00:01:38,340
every query with all parameters because cardinality will be insane.

34
00:01:39,340 --> 00:01:42,180
Michael: But equally, I think it's quite often not possible to

35
00:01:42,180 --> 00:01:43,980
have every query logged to the logs, right?

36
00:01:43,980 --> 00:01:45,860
Like there's also limitations there.

37
00:01:45,860 --> 00:01:46,360
Nikolay: Exactly.

38
00:01:46,380 --> 00:01:48,660
So pros and cons of both approaches.

39
00:01:49,400 --> 00:01:51,340
Cumulative metrics, they are good.

40
00:01:51,860 --> 00:01:56,480
You can visualize, you can see spikes, everything, but they lack

41
00:01:56,480 --> 00:01:56,980
details.

42
00:01:58,080 --> 00:02:00,480
And same for example, temporary files, right?

43
00:02:00,480 --> 00:02:04,700
We can see at high level, we can see from the pg_stat_database

44
00:02:04,700 --> 00:02:07,260
that, okay, we generate a lot of temporary files.

45
00:02:07,260 --> 00:02:09,660
Queries generate a lot of temporary files, some queries.

46
00:02:10,080 --> 00:02:14,240
We can go down to pg_stat_statements and see which query groups

47
00:02:14,240 --> 00:02:18,460
or normalized queries produce those temporary files.

48
00:02:18,660 --> 00:02:20,000
It's already super helpful.

49
00:02:20,380 --> 00:02:25,680
But obviously, the same normalized query, if you take 1 set of

50
00:02:25,680 --> 00:02:29,060
parameters, it can produce gigabytes of temporary files.

51
00:02:29,060 --> 00:02:32,560
Another set of parameters, it can produce 0 temporary files.

52
00:02:32,600 --> 00:02:35,380
Depending on parameters, it can be very different.

53
00:02:35,380 --> 00:02:38,360
So we need examples, and this is exactly when we need to look

54
00:02:38,360 --> 00:02:39,140
at the logs.

55
00:02:39,520 --> 00:02:45,040
But at the same time, if we log everything, observer effect is

56
00:02:45,040 --> 00:02:45,860
not a joke.

57
00:02:46,240 --> 00:02:53,240
And I must confess, several big clients suffered from observer

58
00:02:53,240 --> 00:02:55,360
effect when I decided to log too much.

59
00:02:55,680 --> 00:02:59,940
And it caused downtime and it was not pleasant experience.

60
00:03:00,720 --> 00:03:04,080
So we always need to think about observer effect.

61
00:03:04,120 --> 00:03:07,600
Those who say log everything, log all queries, they just don't

62
00:03:07,600 --> 00:03:11,420
deal with really heavy loaded systems, unfortunately.

63
00:03:12,340 --> 00:03:12,740
Yeah.

64
00:03:12,740 --> 00:03:15,990
Just try to do it in heavy loaded systems, which has like 100,

65
00:03:15,990 --> 00:03:16,920
000 TPS.

66
00:03:17,780 --> 00:03:18,780
And that's it.

67
00:03:19,220 --> 00:03:23,000
You will put down everything very quickly, you will learn it's

68
00:03:23,000 --> 00:03:24,400
not the way to go.

69
00:03:24,400 --> 00:03:29,240
So we need to limit the amount of logging, but we cannot avoid

70
00:03:29,340 --> 00:03:31,420
logging because we need examples.

71
00:03:32,160 --> 00:03:35,900
So my approach is to combine both things.

72
00:03:36,660 --> 00:03:38,100
Michael: Yeah, it's a balance, isn't it?

73
00:03:38,100 --> 00:03:41,720
We need to record enough information to be able to diagnose issues

74
00:03:41,720 --> 00:03:45,560
when they occur, but not so much that there's excessive overhead.

75
00:03:45,860 --> 00:03:47,900
So it's a balancing act.

76
00:03:48,540 --> 00:03:48,840
Nikolay: Right.

77
00:03:48,840 --> 00:03:52,120
And you need to think, like it's better to think in advance what

78
00:03:52,120 --> 00:03:55,340
to log, because if some incident happened, you somehow mitigated

79
00:03:55,440 --> 00:03:55,600
it.

80
00:03:55,600 --> 00:03:58,980
You want to do a root cause analysis, but logs lack this information.

81
00:04:00,060 --> 00:04:04,160
So you can only guess and you can guess quite well if you have

82
00:04:04,160 --> 00:04:08,480
a lot of experience, but still it's better to have logging

83
00:04:08,480 --> 00:04:10,160
tuned properly in advance.

84
00:04:10,440 --> 00:04:16,900
And I see Postgres in the latest few years, like improving defaults,

85
00:04:17,180 --> 00:04:20,060
but still defaults are not enough usually.

86
00:04:20,060 --> 00:04:22,440
We tune some defaults for our clients always.

87
00:04:23,200 --> 00:04:27,540
I mean, we tune some log related settings to have more logging.

88
00:04:29,020 --> 00:04:32,700
Michael: Yeah, in my research for this I found 3 different parameters

89
00:04:32,740 --> 00:04:37,860
that were all either introduced or had the default changed from

90
00:04:38,000 --> 00:04:42,240
off to some version of on all in Postgres 15.

91
00:04:42,620 --> 00:04:46,820
So I was kind of expecting a gradual improvement in this over

92
00:04:46,820 --> 00:04:49,540
time but it seems almost, I don't know if it's a coincidence

93
00:04:49,540 --> 00:04:53,520
or not, but 1 major version where quite a few things are improved

94
00:04:53,520 --> 00:04:57,940
and I haven't seen a whole lot of movement in other areas unfortunately.

95
00:04:58,840 --> 00:05:02,060
Nikolay: Yeah you're probably right, Defaults changed in 15,

96
00:05:02,080 --> 00:05:06,420
you mean log_checkpoints and autovacuum related to 10 minutes,

97
00:05:06,420 --> 00:05:08,860
which is enormous still, like it's huge value.

98
00:05:09,240 --> 00:05:10,820
And what was the third?

99
00:05:11,580 --> 00:05:14,200
Michael: And there's a new log startup progress interval.

100
00:05:14,200 --> 00:05:16,080
Nikolay: But it's new, it's not a default change.

101
00:05:16,080 --> 00:05:20,660
But decision was made to not to
introduce it being turned off

102
00:05:20,660 --> 00:05:23,140
by default decision, which is a
great decision.

103
00:05:23,140 --> 00:05:23,740
I agree.

104
00:05:24,340 --> 00:05:27,480
And it's quite like 10 seconds,
right?

105
00:05:27,500 --> 00:05:28,520
So every 10.

106
00:05:29,480 --> 00:05:32,920
Yeah, let's discuss everything,
everything we have, right?

107
00:05:32,980 --> 00:05:36,600
Let's let's, Let's start from something,
move to different topic.

108
00:05:36,600 --> 00:05:41,380
But I agree, so 15 was revolutionary
in terms of bringing good

109
00:05:41,380 --> 00:05:42,540
defaults in logging.

110
00:05:44,320 --> 00:05:45,640
Michael: In some areas, yeah.

111
00:05:45,820 --> 00:05:49,540
Nikolay: Yeah, And also worth mentioning
Postgres log, it's always

112
00:05:49,540 --> 00:05:50,960
a single log.

113
00:05:51,480 --> 00:05:57,720
So you can imagine if we had ability
to separate streams of logging,

114
00:05:57,720 --> 00:06:01,620
for example, slow queries could
go in 1 place, maybe to faster

115
00:06:01,620 --> 00:06:03,080
disks, or maybe to...

116
00:06:03,080 --> 00:06:07,760
There is a like this idea, very
old idea, that some logging should

117
00:06:07,760 --> 00:06:14,280
go not to local disk but straight
to some storage over network

118
00:06:14,280 --> 00:06:19,400
UDP and that's it like send it
So we could log a lot of more

119
00:06:19,400 --> 00:06:19,900
stuff.

120
00:06:20,340 --> 00:06:24,280
Although when you ask guys who
support, I don't know, like Elastic

121
00:06:24,280 --> 00:06:27,840
or something, which is receiving
a lot of logging from different

122
00:06:27,840 --> 00:06:31,620
systems, they also would like to
limit, you know?

123
00:06:32,800 --> 00:06:34,200
Or Splunk, for example.

124
00:06:34,200 --> 00:06:37,680
Splunk is an old commercial enterprise
system for logs.

125
00:06:38,340 --> 00:06:42,340
They can deal with huge volumes,
but always still cautious about

126
00:06:42,340 --> 00:06:44,440
how much you are going to send
to them.

127
00:06:44,760 --> 00:06:45,260
Michael: Yeah.

128
00:06:45,280 --> 00:06:46,660
It's a good point, though.

129
00:06:46,860 --> 00:06:48,380
Other databases do have...

130
00:06:48,380 --> 00:06:51,940
I'm pretty sure MySQL has a separate
slow query log, because

131
00:06:51,940 --> 00:06:57,100
I've seen people familiar with
MySQL asking about where's Postgres'

132
00:06:57,440 --> 00:06:58,380
slow query log?

133
00:06:58,380 --> 00:07:01,840
Nikolay: Yeah, it's not a big change,
it could be like, I think

134
00:07:01,840 --> 00:07:05,860
it could be, I mean it's easy to
implement, but I imagine how

135
00:07:05,860 --> 00:07:10,760
much of Infrastructure changes
needed to be done in all the systems

136
00:07:10,760 --> 00:07:13,940
if slow query log would be separated
What do you think

137
00:07:13,940 --> 00:07:15,860
Michael: what what I'm missing
something though?

138
00:07:15,860 --> 00:07:18,340
What's the benefit of having it
separated out?

139
00:07:18,340 --> 00:07:21,360
Nikolay: You can put it to different
disks, faster disks for

140
00:07:21,360 --> 00:07:24,640
example, analysis wise also it's
easier

141
00:07:25,440 --> 00:07:28,100
Michael: Because the volume is
likely to be so much higher than

142
00:07:28,100 --> 00:07:28,760
the other

143
00:07:29,100 --> 00:07:32,460
Nikolay: Yeah, maybe Some rotation
policies could be different.

144
00:07:32,800 --> 00:07:38,480
And analysis wise, the nature of
like, I see a bunch of logs,

145
00:07:38,480 --> 00:07:40,820
which are related to query optimization.

146
00:07:41,680 --> 00:07:45,020
And I also see a bunch of log messages,
which are related to

147
00:07:45,020 --> 00:07:50,140
like startup, autovacuum, checkpointing,
like more like not

148
00:07:50,140 --> 00:07:54,100
about query processing, but everything
else, right?

149
00:07:54,620 --> 00:07:58,600
And separation of them is, to me,
it sounds reasonable.

150
00:07:59,340 --> 00:08:03,000
So I Imagine it would be possible
if like, what do you think?

151
00:08:03,000 --> 00:08:07,980
Imagine there is a new setting
which says everything related

152
00:08:08,040 --> 00:08:13,940
to query processing, including
log_min_duration_statement based,

153
00:08:15,860 --> 00:08:18,540
Everything that can contain a query
text, actually.

154
00:08:18,900 --> 00:08:19,400
Everything.

155
00:08:19,540 --> 00:08:23,720
Like, for example, slow queries
exceeding log_min_duration_statement.

156
00:08:24,180 --> 00:08:25,880
Queries generated temporary files.

157
00:08:26,120 --> 00:08:28,860
Queries which are slow because
they are waiting for other queries

158
00:08:28,860 --> 00:08:32,060
and logged after deadlock_timeout
1 second by default, right?

159
00:08:32,360 --> 00:08:38,080
Imagine all of them go in 1 place,
everything else stays in the

160
00:08:38,080 --> 00:08:38,680
old place.

161
00:08:38,680 --> 00:08:39,560
But it's optional.

162
00:08:39,880 --> 00:08:42,700
You can decide to choose

163
00:08:42,700 --> 00:08:43,480
Michael: it or not.

164
00:08:44,160 --> 00:08:47,440
Yeah, I haven't thought long and
hard about it, but I do quite

165
00:08:47,440 --> 00:08:50,180
like the simplicity of everything
being in 1 place.

166
00:08:50,180 --> 00:08:52,700
But it can also be helpful, right?

167
00:08:52,940 --> 00:08:55,960
I understand how if you're not
worried about slow queries and

168
00:08:55,960 --> 00:08:58,180
you're looking for other things,
they can be really noisy and

169
00:08:58,180 --> 00:08:58,940
get in the way.

170
00:08:58,940 --> 00:09:01,520
And in a heavy-loading environment,
it would just be impossible

171
00:09:01,520 --> 00:09:04,740
without tools to then get anything
useful out of that.

172
00:09:05,200 --> 00:09:11,040
But like an autovacuum could be
the cause of some slow, like

173
00:09:11,040 --> 00:09:13,640
some issue that you're facing with
query performance.

174
00:09:13,700 --> 00:09:15,600
Or checkpoint could be the cause.

175
00:09:15,600 --> 00:09:19,160
And maybe in a less heavily loaded
environment where you're reviewing

176
00:09:19,160 --> 00:09:24,500
it manually seeing that in in order
might help you debugging

177
00:09:24,680 --> 00:09:26,340
like what's actually happening
here

178
00:09:26,400 --> 00:09:30,260
Nikolay: yeah makes sense makes
sense and also some errors Should

179
00:09:30,260 --> 00:09:34,240
we like, what will happen with
queries which are erroring out?

180
00:09:34,240 --> 00:09:34,740
Yeah.

181
00:09:35,240 --> 00:09:41,140
But, but still like, it could be
a choice, right?

182
00:09:41,460 --> 00:09:46,780
Because people who are interested
in errors, corruption related

183
00:09:46,780 --> 00:09:49,540
errors, I hope we will discuss
them as well.

184
00:09:50,220 --> 00:09:53,960
And maybe a checkpointer behavior
and so on, and less interested

185
00:09:54,000 --> 00:09:56,620
in query processing itself.

186
00:09:57,440 --> 00:10:01,620
They often say, you know, like
your huge query is being logged

187
00:10:02,480 --> 00:10:08,900
just because they are exceeding
1 second, they make our log really

188
00:10:09,640 --> 00:10:10,620
hard to read.

189
00:10:11,180 --> 00:10:11,680
Yeah.

190
00:10:13,140 --> 00:10:13,380
Well,

191
00:10:13,380 --> 00:10:14,980
Michael: hard to read manually,
right?

192
00:10:15,360 --> 00:10:17,380
With tools you can filter to the
bits.

193
00:10:17,380 --> 00:10:22,080
Nikolay: Or with DuckDB you can
have CSV and talk to it through

194
00:10:22,080 --> 00:10:24,900
DuckDB queries, which I think is
a great idea.

195
00:10:24,960 --> 00:10:26,400
Quite a new and great idea.

196
00:10:26,880 --> 00:10:29,880
Michael: But I like this almost
as a way of splitting our conversation.

197
00:10:30,060 --> 00:10:35,420
You mentioned the performance related
stuff and everything else.

198
00:10:36,040 --> 00:10:38,420
Where do you want to start with
1 of those?

199
00:10:39,060 --> 00:10:42,760
Nikolay: Yeah, we already talked
about log_checkpoints.

200
00:10:43,140 --> 00:10:44,940
Let's maybe talk about this.

201
00:10:45,040 --> 00:10:49,080
So in 15 it was changed to on and that's it, that's great.

202
00:10:49,080 --> 00:10:50,140
I mean, it's easy.

203
00:10:50,400 --> 00:10:51,940
Like I think everyone needs it.

204
00:10:51,940 --> 00:10:55,300
And finally in 15, how long ago it was?

205
00:10:55,440 --> 00:10:56,340
3 years ago.

206
00:10:56,920 --> 00:10:57,600
Michael: Yeah, nearly.

207
00:10:57,740 --> 00:11:01,100
Nikolay: It was realized that everyone should have it, everyone.

208
00:11:01,760 --> 00:11:05,020
Michael: So our advice to people would be, if you're on a version

209
00:11:05,020 --> 00:11:07,320
older than that, turn it on.

210
00:11:07,360 --> 00:11:08,440
Nikolay: 100% so.

211
00:11:08,440 --> 00:11:09,440
Michael: If it's not already.

212
00:11:09,440 --> 00:11:13,660
Nikolay: Yeah, and another thing happened in 2015, it's related

213
00:11:13,660 --> 00:11:14,940
to autovacuum logging.

214
00:11:15,020 --> 00:11:18,780
So log_autovacuum_min_duration was always, I think before that

215
00:11:18,780 --> 00:11:19,840
it was minus 1.

216
00:11:19,840 --> 00:11:23,740
So it means the vacuum is not logged at all.

217
00:11:24,220 --> 00:11:25,640
And it was changed to 10 minutes.

218
00:11:25,640 --> 00:11:26,980
In my opinion, it's not enough.

219
00:11:26,980 --> 00:11:29,280
10 minutes is way too long.

220
00:11:29,540 --> 00:11:31,240
It should be 10 seconds.

221
00:11:32,380 --> 00:11:38,180
In some places we set it to 0 to see everything.

222
00:11:38,640 --> 00:11:43,280
Of course, I can imagine, especially if you have many workers,

223
00:11:43,280 --> 00:11:47,680
you also tuned the autovacuum_naptime to very low value so

224
00:11:47,680 --> 00:11:50,400
they sleep less between work.

225
00:11:50,740 --> 00:11:53,000
And this, or workers sleep less, right?

226
00:11:53,000 --> 00:11:57,940
In this case, I imagine amount of logging from a vacuum workers

227
00:11:58,280 --> 00:11:59,240
might be substantial.

228
00:11:59,860 --> 00:12:04,640
But at least like 10 seconds or a few seconds threshold, it's

229
00:12:04,640 --> 00:12:05,140
good.

230
00:12:05,500 --> 00:12:11,540
Because if you have a lot of tables, 10 minutes, you will miss

231
00:12:11,540 --> 00:12:13,640
a lot of details, interesting details.

232
00:12:13,740 --> 00:12:14,960
Michael: And it's not that frequent.

233
00:12:14,960 --> 00:12:17,960
The thing if we go back to that, you know, recording enough information

234
00:12:17,960 --> 00:12:21,820
to diagnose issues without excessive overhead, the overhead of

235
00:12:21,820 --> 00:12:25,440
logging all of your autovacuums, unless you have, as you say,

236
00:12:26,120 --> 00:12:29,980
really, like, yeah, very low nap time and maybe lots and lots

237
00:12:29,980 --> 00:12:34,840
of tape, like, maybe you've got thousands of tapes, thousands

238
00:12:34,840 --> 00:12:38,600
of schemas, like if they've multiplied out, I think I heard somebody

239
00:12:38,600 --> 00:12:41,420
that was talking about having millions of objects in their Database.

240
00:12:41,780 --> 00:12:44,180
Nikolay: All right, do you remember if you have, for example,

241
00:12:44,180 --> 00:12:47,780
naptime 1 second, You have 10 workers, for example, and you

242
00:12:47,780 --> 00:12:52,500
have longer running Transaction holding extreme horizon, so the

243
00:12:52,500 --> 00:12:54,360
vacuum cannot Delete some dead Tuples.

244
00:12:54,960 --> 00:12:58,340
How often those workers will visit this Table, which needs processing?

245
00:12:58,460 --> 00:13:00,640
Maybe they will do it in loop, right?

246
00:13:00,720 --> 00:13:01,580
Michael: Yeah, good point.

247
00:13:01,580 --> 00:13:02,480
I don't know.

248
00:13:02,800 --> 00:13:07,120
Nikolay: And if Table is very small, then quickly process it,

249
00:13:07,120 --> 00:13:09,920
sleep 1 second, again process it, sleep 1 second, so it will

250
00:13:09,920 --> 00:13:11,260
be a lot of spam, right?

251
00:13:12,660 --> 00:13:16,040
Unless there is additional logic
in autovacuum, I'm not 100%

252
00:13:16,420 --> 00:13:16,920
remembering.

253
00:13:17,460 --> 00:13:19,420
It would be good to double check.

254
00:13:19,900 --> 00:13:23,480
Michael: Worth checking, but also
to once per second, when we're

255
00:13:23,480 --> 00:13:25,860
talking about logging, when we're
talking about potentially,

256
00:13:26,180 --> 00:13:28,060
it's also still not that bad.

257
00:13:28,520 --> 00:13:31,100
Nikolay: But 10 workers, Thousands
tables.

258
00:13:32,300 --> 00:13:35,500
I can imagine something like this.

259
00:13:35,500 --> 00:13:36,600
But also, nobody...

260
00:13:36,940 --> 00:13:38,940
Like, default nap time is 1 minute.

261
00:13:39,440 --> 00:13:41,460
And default number of workers is
3.

262
00:13:42,160 --> 00:13:42,660
Yeah.

263
00:13:42,980 --> 00:13:43,880
So I...

264
00:13:43,900 --> 00:13:45,480
Anyway, like, I definitely...

265
00:13:45,480 --> 00:13:50,920
We always recommend to our clients
to see more of the vacuum

266
00:13:50,920 --> 00:13:52,180
work in logs.

267
00:13:52,920 --> 00:13:57,860
It's really helpful, especially
during some incidents, after

268
00:13:57,860 --> 00:14:02,020
some incidents, we have this useful
information to understand

269
00:14:02,320 --> 00:14:07,880
what happened in terms of data
that cannot be deleted.

270
00:14:08,380 --> 00:14:10,740
It also reports xmin horizon, by
the way.

271
00:14:10,760 --> 00:14:15,040
It reports it like, it reports
the delta between current xmin

272
00:14:15,060 --> 00:14:18,660
and xmin horizon, so it's super
useful.

273
00:14:20,280 --> 00:14:20,640
Yeah.

274
00:14:20,640 --> 00:14:24,280
Michael: Nice, so just to be clear,
anything, if you're on older

275
00:14:24,280 --> 00:14:27,160
than 15, make sure you have this
set to something so that you're

276
00:14:27,160 --> 00:14:29,020
actually logging autovacuum at
all.

277
00:14:29,160 --> 00:14:33,820
And even if you're on 15 or above,
consider lowering it substantially

278
00:14:34,860 --> 00:14:35,440
from the default.

279
00:14:35,440 --> 00:14:39,300
Nikolay: Yeah, 1 more thing here
is that if we don't have activity

280
00:14:39,600 --> 00:14:46,520
at all, in this case, it will be
producing some log entries which

281
00:14:46,520 --> 00:14:47,980
are probably not super useful.

282
00:14:49,860 --> 00:14:54,020
Michael: If we don't have any autovacuum
activity or any any

283
00:14:54,020 --> 00:14:56,380
kind of updates, delete.

284
00:14:56,380 --> 00:14:59,440
Nikolay: Yeah, or maybe SELECT
only during some long time.

285
00:15:00,060 --> 00:15:04,740
Michael: But that was the argument
for not having checkpoint

286
00:15:04,800 --> 00:15:07,940
logging on, if there's a silent
database.

287
00:15:08,860 --> 00:15:10,580
Nikolay: There was a battle about
it, yeah.

288
00:15:11,120 --> 00:15:15,080
Michael: Yeah, but we're talking
about very, very few log entries

289
00:15:15,440 --> 00:15:17,300
in the grand scheme of things still?

290
00:15:17,980 --> 00:15:22,000
Nikolay: I just mean if you, for
example, if you build some service,

291
00:15:22,000 --> 00:15:26,040
if you have lots of databases or
you build a platform for other

292
00:15:26,040 --> 00:15:26,540
people.

293
00:15:26,820 --> 00:15:27,760
Michael: Sure, sure.

294
00:15:27,880 --> 00:15:30,900
Nikolay: And you can imagine like
many of them will have databases

295
00:15:31,040 --> 00:15:37,500
which are not super active, but
logging has costs.

296
00:15:38,500 --> 00:15:44,540
If you have, say you have like
10,000 databases and 9,000 of

297
00:15:44,540 --> 00:15:48,980
them are not super active and you
will be paying for this disk

298
00:15:49,000 --> 00:15:50,640
space for logs, right?

299
00:15:50,740 --> 00:15:55,220
This will make you think differently
and try to save.

300
00:15:55,280 --> 00:15:58,700
But this is exactly like, this
is different situation compared

301
00:15:58,700 --> 00:16:03,160
to, okay, I have 1 database, it's
hot startup, AI startup, it's

302
00:16:03,160 --> 00:16:03,660
growing.

303
00:16:04,620 --> 00:16:09,300
It's a single like monolith or
few monolith or shard, it doesn't

304
00:16:09,300 --> 00:16:09,800
matter.

305
00:16:10,320 --> 00:16:13,080
And we need to understand our workload
really well.

306
00:16:13,080 --> 00:16:15,420
In this case, we need to log everything,
right?

307
00:16:15,960 --> 00:16:17,080
Michael: Yeah, I get what you mean.

308
00:16:17,080 --> 00:16:20,660
But going back to the, let's say
the platform that has 10, 000

309
00:16:20,660 --> 00:16:26,940
of which 9, 000 are quiet, in that
case why not rotate the, like,

310
00:16:27,800 --> 00:16:29,440
just delete logs more often?

311
00:16:29,860 --> 00:16:34,060
Nikolay: Good point, So yeah, if
you, for example, understand

312
00:16:34,060 --> 00:16:40,360
that you're going to log something
that's probably in not loaded

313
00:16:40,380 --> 00:16:44,820
system is not super useful, but
you know the maximum amount of

314
00:16:44,820 --> 00:16:49,140
that already, how many lines it
will be, and you know it will

315
00:16:49,140 --> 00:16:52,260
be rotated so there is just constant
price for each node that

316
00:16:52,260 --> 00:16:53,440
you're going to accept.

317
00:16:54,860 --> 00:16:57,100
And logs also can be compressed
really well.

318
00:16:59,140 --> 00:17:01,500
Yeah I agree, I agree with this
point as well.

319
00:17:02,960 --> 00:17:05,800
I just wanted us not to forget
about different perspectives people

320
00:17:05,800 --> 00:17:06,540
might have.

321
00:17:06,820 --> 00:17:07,700
Michael: Yeah, great.

322
00:17:08,420 --> 00:17:13,380
Nikolay: 1 big pet versus huge
cattle you need to take care of.

323
00:17:14,480 --> 00:17:15,600
So yeah.

324
00:17:16,800 --> 00:17:17,280
Good.

325
00:17:17,280 --> 00:17:18,260
What about autovacuum?

326
00:17:18,260 --> 00:17:19,120
That's it, right?

327
00:17:19,120 --> 00:17:24,880
So we need to know details, especially
about larger tables, especially

328
00:17:24,880 --> 00:17:26,540
when autovacuum is blocked.

329
00:17:26,740 --> 00:17:29,640
And if you're, for example, a backend
developer, I think it's

330
00:17:29,640 --> 00:17:33,480
interesting for you as well, because
it may sound autovacuum

331
00:17:33,540 --> 00:17:36,520
is just some, okay, it's a garbage
collector in Postgres, right?

332
00:17:36,660 --> 00:17:42,240
But if you think, like, if you
have chances to deal with performance

333
00:17:42,540 --> 00:17:46,560
of system you develop, garbage
collection part is very important

334
00:17:46,560 --> 00:17:50,100
because every time you deleted
or updated something or had INSERT

335
00:17:50,740 --> 00:17:53,040
with ROLLBACK, you produce dead
tuples.

336
00:17:53,360 --> 00:17:58,320
And basically, if you deleted something,
COMMIT, return to your

337
00:17:58,320 --> 00:18:01,500
application code something, but
it's only half of work done.

338
00:18:01,960 --> 00:18:05,820
autovacuum will do the remaining
half of work, so to speak.

339
00:18:06,820 --> 00:18:10,460
And you definitely want to know
when it happened, in which volumes.

340
00:18:11,360 --> 00:18:15,060
At some point you will need probably
to process a lot of rows,

341
00:18:15,060 --> 00:18:17,280
like delete millions of rows.

342
00:18:17,280 --> 00:18:18,860
We had an episode about this.

343
00:18:19,240 --> 00:18:23,860
And people keep asking, I saw comments
yesterday, comments like,

344
00:18:24,340 --> 00:18:26,260
okay, deletes, what about updates?

345
00:18:26,480 --> 00:18:30,680
Well, similar, similar updates
also, like they produce new tuple,

346
00:18:30,680 --> 00:18:37,440
they mark old tuple as dead, but
the process of what exactly

347
00:18:38,400 --> 00:18:39,900
will clean it.

348
00:18:40,520 --> 00:18:42,660
And we need to understand its behavior.

349
00:18:42,980 --> 00:18:49,120
And if we want to move really fast,
We need to take care of cleaning,

350
00:18:49,740 --> 00:18:52,760
otherwise we will see a lot of
huge gaps in storage.

351
00:18:53,480 --> 00:18:58,180
Space won't be reclaimed, it won't
be reused very quickly, and

352
00:18:58,180 --> 00:18:59,480
this is called bloat, right?

353
00:18:59,480 --> 00:19:00,720
So we don't want this.

354
00:19:00,720 --> 00:19:03,640
So we need to understand garbage
collection and Postgres, which

355
00:19:03,640 --> 00:19:04,840
is called autovacuum.

356
00:19:05,260 --> 00:19:08,400
Yeah, this is like, that's why
you want autovacuum logging,

357
00:19:08,400 --> 00:19:12,880
even if it's, even if you are not
DBA, but just backend developer.

358
00:19:13,440 --> 00:19:14,600
This was my point.

359
00:19:15,040 --> 00:19:15,540
Michael: Yeah.

360
00:19:15,700 --> 00:19:18,940
And, And it's just like you mentioned
at the beginning, there

361
00:19:18,940 --> 00:19:22,660
are system views where we can look
at when was the last time

362
00:19:22,660 --> 00:19:25,220
autovacuum ran on each table,
which is useful.

363
00:19:26,000 --> 00:19:29,640
But you can't see the couple of
times before that, or...

364
00:19:29,640 --> 00:19:30,700
Nikolay: You don't see details.

365
00:19:31,340 --> 00:19:31,740
How many...

366
00:19:31,740 --> 00:19:32,500
Yes, exactly.

367
00:19:33,400 --> 00:19:35,720
Was it in the horizon a problem,
actually?

368
00:19:35,820 --> 00:19:36,140
Yeah.

369
00:19:36,140 --> 00:19:39,820
I think, by the way, is there some
work in progress to extend

370
00:19:39,820 --> 00:19:40,320
it?

371
00:19:41,260 --> 00:19:42,180
Michael: I don't know.

372
00:19:43,200 --> 00:19:45,020
Nikolay: Because it's worth extending
I think.

373
00:19:45,020 --> 00:19:47,300
pg_stat_all_tables or pg_stat_user_tables.

374
00:19:48,080 --> 00:19:50,880
It's useful, but it's just a counter
and last timestamp.

375
00:19:51,060 --> 00:19:51,760
Not enough.

376
00:19:52,200 --> 00:19:53,760
2 counters, 2 last timestamps.

377
00:19:54,960 --> 00:19:57,280
Or manual and automatic vacuuming.

378
00:19:57,900 --> 00:19:58,760
Okay, good.

379
00:19:58,980 --> 00:20:02,240
In the same release, just to close
with Postgres 15, it was this

380
00:20:02,240 --> 00:20:03,520
log startup process interval.

381
00:20:03,520 --> 00:20:06,140
I don't think it's for backend
engineers.

382
00:20:06,140 --> 00:20:11,340
It's more for DBAs because if they
tune checkpointer behavior

383
00:20:12,040 --> 00:20:15,780
and distance between checkpoints
is huge, significant.

384
00:20:15,900 --> 00:20:19,000
In this case, startup can, might
take long time and it's great

385
00:20:19,000 --> 00:20:24,480
that now we have this setting and
as you said Default is quite

386
00:20:24,480 --> 00:20:24,840
good.

387
00:20:24,840 --> 00:20:29,620
So we can not to think about it
at all Yeah, what's next?

388
00:20:30,800 --> 00:20:32,680
Michael: What about log_statement?

389
00:20:32,680 --> 00:20:34,860
We talked about that in a recent
episode as well.

390
00:20:35,020 --> 00:20:37,860
The default there is none, right?

391
00:20:38,260 --> 00:20:39,000
Nikolay: Yeah, exactly.

392
00:20:39,000 --> 00:20:44,380
And I like to recommend people
to set DDL, to put DDL there just

393
00:20:44,380 --> 00:20:48,140
to trace all schema changes.

394
00:20:48,160 --> 00:20:50,040
When it was done, who did it.

395
00:20:50,140 --> 00:20:55,280
Of course, actor may overwrite
it.

396
00:20:55,900 --> 00:21:01,560
Unless, yeah, so, but it's good
to have some default there.

397
00:21:01,560 --> 00:21:06,400
And I think there's a there's an
ability actually to forbid overriding,

398
00:21:06,420 --> 00:21:07,500
right, some settings.

399
00:21:08,260 --> 00:21:10,940
Michael: So I think there's 2,
there's almost 2 categories here,

400
00:21:10,940 --> 00:21:11,120
right?

401
00:21:11,120 --> 00:21:13,820
You're talking about almost the
security element of it.

402
00:21:13,820 --> 00:21:18,740
But I think even if we, even if
we assume it's general case,

403
00:21:18,740 --> 00:21:22,340
it's still extremely useful to
see when DDL changes were happening

404
00:21:22,340 --> 00:21:24,020
in the context of everything else.

405
00:21:24,020 --> 00:21:27,040
Nikolay: The timestamp, exact,
exact query.

406
00:21:27,280 --> 00:21:31,080
For example, it may save you time
in future if you, for example,

407
00:21:31,080 --> 00:21:34,780
start dealing with logical replication,
doing some experiments,

408
00:21:34,960 --> 00:21:36,920
and it's blocked by DDL.

409
00:21:37,720 --> 00:21:41,540
But you can trace when exactly
it happened and which DDL it was,

410
00:21:41,640 --> 00:21:42,280
which is great.

411
00:21:42,280 --> 00:21:45,140
Michael: Or even look back historically
over the past 30 days

412
00:21:45,140 --> 00:21:48,520
and be like, which days do people
run DDL on which ones don't

413
00:21:48,520 --> 00:21:52,440
they you know that you could plan
an upgrade around when it's

414
00:21:52,440 --> 00:21:54,440
unlikely to have DDL happening.

415
00:21:54,520 --> 00:21:59,320
Nikolay: It's super helpful in
various incident analysis when

416
00:21:59,320 --> 00:22:02,180
you have deployment and something
went wrong.

417
00:22:02,700 --> 00:22:04,000
We discussed it many times.

418
00:22:04,000 --> 00:22:08,680
DDL may be a reason if it cooked
not properly, it may be a reason

419
00:22:09,140 --> 00:22:11,400
for downtime or some issues.

420
00:22:11,980 --> 00:22:15,980
And if you have log_statement equal
DDL, you can trace it.

421
00:22:16,440 --> 00:22:16,940
Yes.

422
00:22:17,220 --> 00:22:17,720
Yeah.

423
00:22:18,060 --> 00:22:22,180
At least after the incident happened.

424
00:22:23,100 --> 00:22:25,400
So, definitely DDL is a good idea.

425
00:22:26,600 --> 00:22:28,340
Michael: And the cost is very low,
right?

426
00:22:28,340 --> 00:22:30,480
How much DDL are you running on
your system?

427
00:22:31,360 --> 00:22:34,020
Nikolay: It's not thousand per
second, definitely.

428
00:22:34,240 --> 00:22:34,740
Michael: No.

429
00:22:34,820 --> 00:22:35,560
I hope.

430
00:22:36,760 --> 00:22:36,900
You

431
00:22:36,900 --> 00:22:42,040
Nikolay: know, I see systems which
have DDL as a part of their

432
00:22:42,660 --> 00:22:44,840
normal workload coming from users.

433
00:22:45,060 --> 00:22:45,760
It happens.

434
00:22:47,100 --> 00:22:51,100
Well, for example, if it's a multi-tenant
system, which creates

435
00:22:51,100 --> 00:22:54,220
tables for each customer and if
they extend functionality, they

436
00:22:54,220 --> 00:22:55,320
also create something.

437
00:22:56,000 --> 00:22:59,100
Also temporary tables, which I
don't like at all.

438
00:22:59,100 --> 00:23:03,120
Like I avoid temporary tables at
all costs, but people use them

439
00:23:03,120 --> 00:23:06,680
and temporary table creation is
also DDL.

440
00:23:07,200 --> 00:23:08,300
Michael: Yeah, good point.

441
00:23:08,940 --> 00:23:14,380
Nikolay: So I can imagine that
there are cases when it can happen,

442
00:23:14,380 --> 00:23:18,720
like we have a storm of DDL coming
to logs, but for me it's kind

443
00:23:18,720 --> 00:23:21,880
of edge case rather than normal.

444
00:23:23,760 --> 00:23:24,520
Michael: So, great.

445
00:23:25,160 --> 00:23:25,660
Yeah.

446
00:23:26,040 --> 00:23:30,800
On the security side, I wondered
about connections and disconnections

447
00:23:31,020 --> 00:23:32,380
are both off by default.

448
00:23:32,380 --> 00:23:35,500
Nikolay: I don't think it's what
this this is sometimes I see

449
00:23:35,500 --> 00:23:37,200
people turn it on.

450
00:23:37,280 --> 00:23:39,080
It depends on the particular system.

451
00:23:39,760 --> 00:23:41,120
In many cases, it's off.

452
00:23:41,120 --> 00:23:42,600
It's all by off by default.

453
00:23:42,660 --> 00:23:46,020
And this information is usually
most interesting.

454
00:23:46,020 --> 00:23:51,440
It looks most interesting to DBAs,
DBREs, not Backend developers.

455
00:23:52,200 --> 00:23:56,820
Let's move on closer to workloads
and query optimization maybe.

456
00:23:57,280 --> 00:24:00,140
Michael: Yeah, and I think we did
have an episode either about

457
00:24:00,140 --> 00:24:05,320
pgaudit or about, did we do it
specifically about pgaudit or

458
00:24:05,320 --> 00:24:06,680
about security or something like
that?

459
00:24:06,680 --> 00:24:09,320
So I'll link up that episode because
I think that covers probably

460
00:24:09,320 --> 00:24:13,140
the more interesting parts of the
DBA side of things.

461
00:24:13,380 --> 00:24:14,120
Nikolay: Sounds good.

462
00:24:14,240 --> 00:24:18,080
Yeah, maybe Let's talk about log_lock_waits before we focus

463
00:24:18,080 --> 00:24:19,460
on workload analysis.

464
00:24:20,140 --> 00:24:24,360
So it's also part of workload analysis,
just specific part related

465
00:24:24,380 --> 00:24:25,280
to logs.

466
00:24:25,840 --> 00:24:28,400
And we also discussed it recently,
I remember.

467
00:24:28,940 --> 00:24:32,280
And this is definitely what we
recommend turning on because it's

468
00:24:32,280 --> 00:24:33,220
off by default.

469
00:24:33,960 --> 00:24:35,400
Yeah, because it's helpful.

470
00:24:35,820 --> 00:24:37,740
Also during incidents, that's helpful.

471
00:24:37,920 --> 00:24:43,380
If, if you're developing your app,
didn't think about multi-session

472
00:24:43,940 --> 00:24:48,800
concurrency control properly and
blocking situations, you might

473
00:24:48,800 --> 00:24:52,980
have incidents, spike of active
sessions, or maybe even downtime

474
00:24:53,460 --> 00:24:55,780
caused by heavy lock contention.

475
00:24:57,180 --> 00:25:01,220
And having logs related to this
is really helpful.

476
00:25:02,060 --> 00:25:06,500
Really helpful, because without
it, you can only diagnose something

477
00:25:06,500 --> 00:25:07,160
in runtime.

478
00:25:08,240 --> 00:25:13,380
Okay, you see there's a spike of
exclusive locks.

479
00:25:14,340 --> 00:25:19,820
So, With log_lock_waits, you will
see that some session tried

480
00:25:19,820 --> 00:25:23,140
to run some query, but it was blocked
by another session.

481
00:25:23,840 --> 00:25:30,720
And it was waiting, our query,
was waiting longer than

482
00:25:30,720 --> 00:25:32,940
deadlock_timeout, which is by default 1
second.

483
00:25:33,540 --> 00:25:37,240
So if you, for example, try to
update 1 row, but there is no

484
00:25:37,240 --> 00:25:40,840
transaction which is not committed
yet, it may be already updated

485
00:25:40,840 --> 00:25:44,060
this row, but it's not committed,
and we need to wait until commit

486
00:25:44,060 --> 00:25:45,460
or Rollback for the transaction.

487
00:25:45,920 --> 00:25:50,260
And our update lasts, like, waiting
period lasts 1 second.

488
00:25:50,900 --> 00:25:56,040
In some systems it may be longer
or less longer if deadlock_timeout

489
00:25:56,040 --> 00:25:56,740
was adjusted.

490
00:25:57,700 --> 00:26:01,860
And this gets logged and you see,
okay, our session process ID

491
00:26:01,860 --> 00:26:07,400
with executing query this was waiting
for long and it was waiting

492
00:26:07,400 --> 00:26:08,680
for those process IDs.

493
00:26:08,680 --> 00:26:11,540
Unfortunately we don't see details
about those sessions.

494
00:26:11,680 --> 00:26:13,220
This is disadvantage.

495
00:26:13,500 --> 00:26:17,200
But it's already half of the coin
visible.

496
00:26:18,300 --> 00:26:21,380
Michael: Yeah well imagine trying
to debug without this, it just

497
00:26:21,380 --> 00:26:26,280
seems to me, reproducing ordering
for locking issues just seems

498
00:26:26,280 --> 00:26:29,400
to me to be so difficult without
this kind of information.

499
00:26:30,040 --> 00:26:32,860
Nikolay: Yeah, also it's helpful
if you have deadlock issues,

500
00:26:33,600 --> 00:26:37,360
and you try to understand your
system better, this kind of logging

501
00:26:37,360 --> 00:26:38,000
is helpful.

502
00:26:38,080 --> 00:26:41,880
I cannot say it will answer all
your questions, unfortunately.

503
00:26:42,040 --> 00:26:47,060
I had cases when deadlock troubleshooting
was extremely challenging,

504
00:26:47,720 --> 00:26:51,040
and we had logging, we had everything,
but it was still challenging.

505
00:26:51,040 --> 00:26:52,120
Sometimes It's really challenging.

506
00:26:52,120 --> 00:26:56,000
You need to go to application code
understand what exactly It's

507
00:26:56,000 --> 00:27:00,100
doing but it's already something
and it's better to have it always

508
00:27:00,720 --> 00:27:03,700
Michael: Yeah, and this to me seems
like a no-brainer.

509
00:27:03,800 --> 00:27:08,600
The trickier 1, balance-wise, seems
to be the time, whether to

510
00:27:08,600 --> 00:27:10,780
reduce it from 1 second or increase
it.

511
00:27:11,600 --> 00:27:15,060
And that seems quite loaded because
it's doing 2 things, right?

512
00:27:15,060 --> 00:27:21,000
It's determining at what time lock
waits get logged, but also

513
00:27:21,200 --> 00:27:25,020
at what time deadlocks get picked.

514
00:27:25,640 --> 00:27:27,300
Nikolay: Deadlock detection happens.

515
00:27:27,380 --> 00:27:29,240
Michael: Yeah, so it's doing 2
things.

516
00:27:29,240 --> 00:27:33,160
Nikolay: I can, I understand the
decision from engineering perspective?

517
00:27:33,160 --> 00:27:35,280
It was like, where should we log
it?

518
00:27:35,280 --> 00:27:37,780
Oh, this is where we're checking
it.

519
00:27:37,780 --> 00:27:40,580
It's good time to log it exactly
at that time.

520
00:27:40,580 --> 00:27:43,540
So I can imagine this decision,
why it happened.

521
00:27:43,780 --> 00:27:46,300
But I agree with you, there is
some lack of flexibility a little

522
00:27:46,300 --> 00:27:46,640
bit.

523
00:27:46,640 --> 00:27:49,400
But also it's protecting us from
huge volumes.

524
00:27:49,640 --> 00:27:52,540
Like let's talk a little bit about
observer effect.

525
00:27:53,680 --> 00:27:55,320
Here I don't see it actually.

526
00:27:55,320 --> 00:28:00,600
So at each session, like there
is this like 1 second delay, basically

527
00:28:00,720 --> 00:28:03,660
by default, 1 second delay before
it gets logged.

528
00:28:04,140 --> 00:28:06,720
If we have a lot of sessions okay
how many?

529
00:28:07,000 --> 00:28:07,500
100?

530
00:28:07,600 --> 00:28:08,740
100 active sessions.

531
00:28:09,280 --> 00:28:10,080
Michael: Max connections.

532
00:28:10,400 --> 00:28:13,500
Nikolay: Yeah we might have 100
of these messages per second.

533
00:28:13,500 --> 00:28:16,720
Well it's significant volume But
I think on this server with

534
00:28:16,720 --> 00:28:23,680
100 active sessions and I hope
with many more cores, I think

535
00:28:23,680 --> 00:28:26,820
disks are already quite powerful
so they can handle 100 entries

536
00:28:26,820 --> 00:28:29,280
per second for this type of logging.

537
00:28:30,060 --> 00:28:33,400
But it might contribute in some
cases, I think.

538
00:28:33,540 --> 00:28:36,740
Again, in some edge or corner cases,
it might contribute a lot.

539
00:28:37,120 --> 00:28:39,520
So it's worth understanding.

540
00:28:41,120 --> 00:28:45,200
But personally, I've never seen
the case when we decided to turn

541
00:28:45,200 --> 00:28:45,860
it off.

542
00:28:46,280 --> 00:28:49,760
And I think it should be on by
default, but it's off by default.

543
00:28:49,760 --> 00:28:52,540
So advice from us, turn it on.

544
00:28:53,000 --> 00:28:54,020
Michael: Yeah, good 1.

545
00:28:54,960 --> 00:28:55,440
Nikolay: What's next?

546
00:28:55,440 --> 00:28:57,320
Michael: Should we move on to performance
things?

547
00:28:57,380 --> 00:28:59,660
Nikolay: Yeah, maybe temporary
files first.

548
00:28:59,720 --> 00:29:00,480
Michael: Yeah, good 1.

549
00:29:00,480 --> 00:29:02,380
Kind of bridge the gap a little
bit.

550
00:29:02,640 --> 00:29:06,520
Nikolay: Yeah and by default it's
disabled minus 1.

551
00:29:07,580 --> 00:29:12,040
Michael: Yes so this is when in
fact actually I looked into it

552
00:29:12,040 --> 00:29:15,680
this this is logging when the temporary
file gets deleted interestingly

553
00:29:16,320 --> 00:29:21,880
But what's happening is when a
query is using some temporary

554
00:29:22,760 --> 00:29:26,680
storage, for example, if a sort
spills to disk or doesn't have

555
00:29:26,680 --> 00:29:33,360
enough working memory available
to do a hash join or any operation

556
00:29:34,700 --> 00:29:36,180
that uses temporary data.

557
00:29:36,260 --> 00:29:39,440
So I had this under performance
but I guess it is also just general

558
00:29:39,620 --> 00:29:46,740
system-wide tuning as well around
config parameters like work_mem

559
00:29:46,840 --> 00:29:49,580
and hash_mem multiplier more recently
as well.

560
00:29:49,760 --> 00:29:50,520
Nikolay: Yeah, yeah.

561
00:29:50,860 --> 00:29:54,680
Yeah, I can imagine many queries
will start producing a lot of

562
00:29:54,680 --> 00:29:58,580
logs if they need temporary files.

563
00:30:01,120 --> 00:30:05,380
It's so multi-sided question to
me, because if you produce, so

564
00:30:05,380 --> 00:30:11,020
if you try to save on log, logging
of temporary file generation

565
00:30:11,040 --> 00:30:15,800
cases, deletion cases, as you said,
you already have issues with

566
00:30:16,260 --> 00:30:17,460
dealing with disk.

567
00:30:18,340 --> 00:30:20,820
The query processing itself was
dealing with disk.

568
00:30:21,880 --> 00:30:25,120
It's better to capture that info
and try to optimize it as soon

569
00:30:25,120 --> 00:30:25,760
as possible.

570
00:30:26,940 --> 00:30:28,860
But will I put 0 there?

571
00:30:28,860 --> 00:30:33,620
0 means let's log all the cases
even if it's just 8 kilobytes.

572
00:30:34,900 --> 00:30:39,840
Maybe no, but what I would do,
what we usually recommend, I'm

573
00:30:39,840 --> 00:30:42,680
sharing 3 pieces of advice today.

574
00:30:42,980 --> 00:30:45,560
We say minus 1 is definitely not
good.

575
00:30:45,760 --> 00:30:50,380
And of course, in some cases, if
you don't see temporary file

576
00:30:50,380 --> 00:30:53,800
generated right now, looking at
pg_stat_activity or pg_stat_statements,

577
00:30:53,800 --> 00:30:58,680
because they also have temporary
file written and read a couple

578
00:30:58,680 --> 00:31:01,480
of columns, so you can see it there
as well.

579
00:31:01,480 --> 00:31:04,600
If you don't see a big problem
right now, maybe okay, maybe not,

580
00:31:04,600 --> 00:31:05,820
let's not log it.

581
00:31:05,940 --> 00:31:08,540
But why not to be prepared for
future, right?

582
00:31:08,560 --> 00:31:10,020
Michael: Yeah, I would say the
opposite.

583
00:31:10,020 --> 00:31:13,040
I'd say if you're not seeing them
right now, you could even afford

584
00:31:13,040 --> 00:31:16,100
to set it to 0 because you're just
not going to get much volume.

585
00:31:16,440 --> 00:31:20,880
Nikolay: What if tomorrow memory
is not enough and we start logging

586
00:31:20,900 --> 00:31:26,480
very tiny amount, tiny numbers
and it will be a lot of cases?

587
00:31:26,480 --> 00:31:30,260
Michael: Well, depends if you've
if you've tuned work_mem, even

588
00:31:30,260 --> 00:31:36,820
if work_mem is the default 4MB,
it would only be logging most

589
00:31:36,820 --> 00:31:40,240
likely queries that are using more
than 4MB of memory and sorts

590
00:31:40,240 --> 00:31:43,120
and things, so how many of those
per second are you realistically

591
00:31:43,260 --> 00:31:43,760
running?

592
00:31:44,060 --> 00:31:47,260
It seems unlikely to me that it's
going to be super high volume.

593
00:31:47,720 --> 00:31:50,160
Nikolay: First of all, work_mem
should be increased.

594
00:31:50,440 --> 00:31:50,940
Yes.

595
00:31:50,940 --> 00:31:51,440
Right.

596
00:31:51,440 --> 00:31:57,840
But if you keep it very low and
suddenly you, you changed like

597
00:31:57,840 --> 00:32:03,620
data volumes become bigger and
bigger and for like for hash operations

598
00:32:04,280 --> 00:32:07,660
ordering we need more and more
at some point we slightly exceed

599
00:32:07,660 --> 00:32:12,760
4 gigabytes 4 megabytes and many
of processes do it for example

600
00:32:13,100 --> 00:32:17,860
boom you you need to generate temporary
files and you you like

601
00:32:18,240 --> 00:32:23,120
log_temp_files equals 0 add
some text on it additional text,

602
00:32:23,120 --> 00:32:23,620
right?

603
00:32:24,840 --> 00:32:26,060
My approach is different.

604
00:32:26,180 --> 00:32:31,620
I If we don't see it right now,
but we grow This is scary moment

605
00:32:31,620 --> 00:32:32,080
for me.

606
00:32:32,080 --> 00:32:37,480
I would prefer seeing very heavily
loaded systems already having

607
00:32:37,480 --> 00:32:38,620
some temporary files.

608
00:32:39,400 --> 00:32:40,540
And we understand them.

609
00:32:41,120 --> 00:32:45,260
And we go from quite significant
value, like 10 megabytes, And

610
00:32:45,260 --> 00:32:46,720
we go down in stages.

611
00:32:46,720 --> 00:32:51,200
Restart is not needed, by the way,
for all settings we discussed

612
00:32:51,200 --> 00:32:51,700
today.

613
00:32:51,880 --> 00:32:57,160
So far, restart is not needed,
which is perfect for going in

614
00:32:57,160 --> 00:32:59,060
iterations gradually, right?

615
00:32:59,060 --> 00:33:03,960
So we gradually descend to 0, but
maybe not to 0, maybe 200 kilobytes

616
00:33:03,960 --> 00:33:04,660
or something.

617
00:33:05,220 --> 00:33:06,180
Michael: But what's the difference?

618
00:33:06,180 --> 00:33:07,240
Like it's on.

619
00:33:07,800 --> 00:33:10,520
Are you going to get temporary
files that are 100 kilobytes?

620
00:33:11,400 --> 00:33:14,380
Nikolay: Well, if we put 100 kilobytes
to log_temp_files, we

621
00:33:14,380 --> 00:33:16,300
will see only the cases which exceeded.

622
00:33:17,180 --> 00:33:20,660
Michael: Yeah, but what I mean
is if work_mem is 4MB and hash_mem_multiplier

623
00:33:21,100 --> 00:33:24,980
is 2 by default, we're only realistically
going to have temp

624
00:33:24,980 --> 00:33:29,660
files that are more roughly, very
roughly 4MB to 8MB.

625
00:33:29,660 --> 00:33:30,540
Nikolay: I understood you.

626
00:33:30,540 --> 00:33:34,080
Well, obviously we think about
it differently and I'm not sure

627
00:33:34,080 --> 00:33:34,920
who is right.

628
00:33:36,180 --> 00:33:39,680
You think when memory is not enough,
everything goes to temporary

629
00:33:39,680 --> 00:33:40,180
file.

630
00:33:40,200 --> 00:33:43,440
I think when memory is not enough,
only Delta goes to temporary

631
00:33:43,440 --> 00:33:44,680
file, which is a

632
00:33:44,680 --> 00:33:45,940
Michael: good, Good point.

633
00:33:45,940 --> 00:33:48,260
I think it's actually different
in 2 different cases.

634
00:33:48,260 --> 00:33:51,240
So, for example, like sorts, I
think it's all or nothing.

635
00:33:52,660 --> 00:34:01,120
But I think, for example, with
bitmap scans that have non exact

636
00:34:03,060 --> 00:34:05,320
Nikolay: Michael, we have homework
to do.

637
00:34:05,800 --> 00:34:06,300
Yeah.

638
00:34:08,740 --> 00:34:09,860
Michael: It's a good point, though.

639
00:34:09,860 --> 00:34:10,260
You're right.

640
00:34:10,260 --> 00:34:12,040
There probably are some operations.

641
00:34:12,040 --> 00:34:13,680
I think including bit scans

642
00:34:15,820 --> 00:34:19,320
Nikolay: In this Interesting question.

643
00:34:19,360 --> 00:34:21,300
Yeah, I Talk to some hackers.

644
00:34:21,300 --> 00:34:23,160
It might sound very simple actually.

645
00:34:25,240 --> 00:34:27,600
Michael: I'm pretty sure with sorts
it's all or nothing.

646
00:34:27,620 --> 00:34:30,400
But it's a good point that going
down in stages makes a load

647
00:34:30,400 --> 00:34:30,820
of sense.

648
00:34:30,820 --> 00:34:34,260
And I was just trying to say I
feel like you can skip it if you've

649
00:34:34,940 --> 00:34:35,620
currently got it.

650
00:34:35,620 --> 00:34:36,360
Nikolay: SERGEY VASILYEVICH Yeah.

651
00:34:36,540 --> 00:34:38,140
Why go down or nothing?

652
00:34:38,140 --> 00:34:39,340
I understand this logic, yeah.

653
00:34:39,340 --> 00:34:42,620
Michael: But for me, low is almost as good as 0.

654
00:34:42,620 --> 00:34:44,560
Like, there's no difference realistically.

655
00:34:44,640 --> 00:34:45,840
It's 100 kilobytes, great.

656
00:34:45,840 --> 00:34:50,460
If that makes things much safer in extremely heavily loaded systems

657
00:34:50,500 --> 00:34:55,140
with lots of potential max connections, great, low is still good.

658
00:34:55,400 --> 00:34:55,900
Nikolay: Yeah.

659
00:34:56,840 --> 00:35:00,780
But again, this is a useful parameter because if you have cases

660
00:35:01,240 --> 00:35:04,600
when, we discussed it in the beginning already, when you have

661
00:35:04,600 --> 00:35:07,760
cases you can see some normalized query but you need examples

662
00:35:07,820 --> 00:35:11,160
to understand how to reproduce it and then decide what to do

663
00:35:11,160 --> 00:35:11,760
with it.

664
00:35:11,760 --> 00:35:15,720
Raise work_mem globally, locally, this will be a question.

665
00:35:15,720 --> 00:35:18,600
Michael: 1 user, for like an analytics user, that kind of thing.

666
00:35:18,600 --> 00:35:19,940
Nikolay: Right, right, right, yeah.

667
00:35:20,740 --> 00:35:22,040
Yeah, so okay, good.

668
00:35:23,100 --> 00:35:28,060
And memory management might be quite challenging, like it's sometimes

669
00:35:28,080 --> 00:35:28,760
not trivial.

670
00:35:30,080 --> 00:35:35,780
Good, So let's talk about the main part of query analysis.

671
00:35:35,860 --> 00:35:39,140
It's pg_stat_statements is not enough, right?

672
00:35:39,240 --> 00:35:44,280
So you need to either have quite low log_min_duration_statement,

673
00:35:45,180 --> 00:35:50,740
at least 1 second, preferably lower, like half a second, maybe

674
00:35:50,740 --> 00:35:57,480
100 milliseconds, to capture more cases of slow queries and then

675
00:35:57,660 --> 00:35:58,680
optimize them.

676
00:35:59,640 --> 00:36:00,140
Right?

677
00:36:00,480 --> 00:36:01,280
Michael: Yeah, absolutely.

678
00:36:01,380 --> 00:36:03,740
And I like your point about them being examples.

679
00:36:04,000 --> 00:36:07,500
So it's not necessarily that the average of that query is going

680
00:36:07,500 --> 00:36:09,280
to be running in 100 milliseconds.

681
00:36:09,280 --> 00:36:10,620
We want to catch the outliers.

682
00:36:10,680 --> 00:36:14,540
Like, what are the parameters for that query that mean it does

683
00:36:14,540 --> 00:36:16,200
tend to run over 100 milliseconds?

684
00:36:16,280 --> 00:36:17,540
So we get those examples.

685
00:36:17,640 --> 00:36:18,380
It's perfect.

686
00:36:19,820 --> 00:36:19,920
Nikolay: And

687
00:36:19,920 --> 00:36:21,980
Michael: again, 1 that we can lower in stages?

688
00:36:23,200 --> 00:36:24,900
Nikolay: Yeah, definitely.

689
00:36:25,360 --> 00:36:30,940
So I will start if never, like if it's big unknown, like from

690
00:36:30,940 --> 00:36:32,540
5 seconds and go down.

691
00:36:33,400 --> 00:36:36,580
Doesn't make any sense to start with really large value because

692
00:36:36,580 --> 00:36:40,760
I assume we talk about OLTP systems where, for example, 10 seconds

693
00:36:40,760 --> 00:36:43,400
is already an absolutely slow query.

694
00:36:45,120 --> 00:36:51,220
I usually see 1 or half a second as a value being used these days.

695
00:36:52,220 --> 00:36:56,340
And if you already performed a lot of optimization efforts and

696
00:36:56,920 --> 00:37:00,560
you want to improve even further, you can go down.

697
00:37:00,660 --> 00:37:04,820
But maybe, maybe it's worth it instead of this.

698
00:37:05,420 --> 00:37:10,040
Oh, before we go there, there's actually an idea you can go down

699
00:37:10,040 --> 00:37:12,420
to 0 but switch to sampling.

700
00:37:13,340 --> 00:37:16,900
There is statement-level sampling and transaction-level sampling,

701
00:37:17,780 --> 00:37:20,660
which is supported since a few
years ago.

702
00:37:21,200 --> 00:37:23,400
Michael: Yeah, you can have 2 levels,
can't you?

703
00:37:23,400 --> 00:37:27,220
You can set log_min_duration_statement
to, let's say, 100 milliseconds,

704
00:37:27,720 --> 00:37:29,540
and that will log everything above
100.

705
00:37:29,540 --> 00:37:33,780
Then you can have like a second
threshold that is log_min_duration_sample

706
00:37:34,000 --> 00:37:37,920
that's like, you could say
10 milliseconds and then you

707
00:37:37,920 --> 00:37:41,820
separately set a log_statement_sample_rate
to, you could say

708
00:37:41,820 --> 00:37:46,700
like 0.1 and that would do 10%
or 0.01, it would do 1%.

709
00:37:46,720 --> 00:37:51,500
And that would log, let's say,
1% of the statements that were

710
00:37:51,500 --> 00:37:53,640
between 10 and 100 milliseconds.

711
00:37:55,800 --> 00:37:59,700
So, yeah, it would be a way of,
like, capturing some examples.

712
00:38:00,800 --> 00:38:05,640
Nikolay: Yeah, like 1% of everything,
and everything, like, 100%

713
00:38:05,940 --> 00:38:11,440
if it's above the old log_min_duration_statement threshold.

714
00:38:13,080 --> 00:38:17,140
And this was added in Postgres
13, I think it was adjusted later.

715
00:38:17,500 --> 00:38:21,240
14, so we can say it's already
almost all versions.

716
00:38:22,240 --> 00:38:23,760
Michael: All supported versions,
yeah.

717
00:38:23,760 --> 00:38:25,220
Nikolay: All supported versions,
yeah.

718
00:38:25,420 --> 00:38:27,560
Actually, 13 is the oldest already,
wow.

719
00:38:27,560 --> 00:38:29,160
Yeah, time flies.

720
00:38:30,820 --> 00:38:31,440
Good, good.

721
00:38:31,640 --> 00:38:37,660
So we talked about this, but there's
also idea instead of logging

722
00:38:37,660 --> 00:38:39,500
queries, let's log them with plans.

723
00:38:39,860 --> 00:38:45,420
And just jump straight to using
auto_explain and just not to

724
00:38:45,420 --> 00:38:47,360
use what we just discussed.

725
00:38:47,780 --> 00:38:52,560
Log_min_duration_statement and
log_min_duration_sample, blah, blah,

726
00:38:52,740 --> 00:38:53,240
blah.

727
00:38:53,240 --> 00:38:58,580
So just auto_explain brings plans,
which is great.

728
00:38:59,840 --> 00:39:00,060
And...

729
00:39:00,060 --> 00:39:03,940
Michael: Yeah, and it also does
log the statement, so you do

730
00:39:03,940 --> 00:39:05,100
still get the query.

731
00:39:05,540 --> 00:39:08,940
But you could, in theory, have
a third setting, you could have

732
00:39:08,940 --> 00:39:14,520
a third value, so you might want
to log the queries above 100

733
00:39:14,540 --> 00:39:18,140
milliseconds, and you have a separate
threshold for logging auto_explain

734
00:39:19,140 --> 00:39:21,860
plans at a different threshold
if you wanted to.

735
00:39:22,060 --> 00:39:25,520
Nikolay: It was you who surprised
me a lot a couple of years

736
00:39:25,520 --> 00:39:29,260
ago when you said auto_explain
support sampling for ages.

737
00:39:29,260 --> 00:39:29,605
Yeah.

738
00:39:29,605 --> 00:39:31,140
I haven't noticed somehow.

739
00:39:31,200 --> 00:39:32,220
Yeah, but it's great.

740
00:39:33,680 --> 00:39:38,740
You can log every plan, execution
with plan.

741
00:39:39,020 --> 00:39:43,260
You log actual execution with plan,
but only 1% samples of it,

742
00:39:43,260 --> 00:39:43,760
right?

743
00:39:44,720 --> 00:39:47,480
Michael: Yeah, so this is slightly
different though, I think,

744
00:39:47,520 --> 00:39:51,920
because with the sampling on the
log_min_duration_statement,

745
00:39:52,120 --> 00:39:56,720
I get the impression it's well,
I'm not sure maybe maybe this

746
00:39:56,720 --> 00:40:01,120
is a question I get impression
it's tracking them all but only

747
00:40:01,120 --> 00:40:04,200
logging a percentage of them.

748
00:40:04,540 --> 00:40:10,020
So you're then saving yourself
from an overhead around logging.

749
00:40:10,240 --> 00:40:10,740
Because

750
00:40:11,040 --> 00:40:11,540
Nikolay: it's...

751
00:40:12,600 --> 00:40:16,860
SLAVOJ Yes, calculate metrics and
then log to disk.

752
00:40:17,020 --> 00:40:17,680
Michael: SIMON Exactly.

753
00:40:17,680 --> 00:40:22,200
So there's the observer effect
of observing, of tracking timings,

754
00:40:22,200 --> 00:40:25,580
and then there's the second observer
effect of actually logging.

755
00:40:26,280 --> 00:40:30,900
Whereas with auto_explain, the
sampling, you don't even measure

756
00:40:31,000 --> 00:40:36,820
a certain proportion of the, so
you, it is, it's subtly different,

757
00:40:36,820 --> 00:40:37,540
I think.

758
00:40:37,720 --> 00:40:38,220
Nikolay: Yeah.

759
00:40:38,800 --> 00:40:40,020
What would you recommend?

760
00:40:40,800 --> 00:40:45,140
Like we can have so many options
here and worth saying we can,

761
00:40:45,140 --> 00:40:46,080
it's quite flexible.

762
00:40:46,160 --> 00:40:46,960
We tune it all.

763
00:40:46,960 --> 00:40:50,660
And then you think, okay, this
is a big complex system.

764
00:40:50,660 --> 00:40:51,960
We have so much logging.

765
00:40:52,300 --> 00:40:53,240
We log everything.

766
00:40:53,740 --> 00:40:56,640
Not everything, sometimes same,
sometimes not.

767
00:40:56,740 --> 00:40:58,680
But then you think, actually, you
know what?

768
00:40:58,780 --> 00:41:04,380
I still need to do such statements
because I want reliable, higher

769
00:41:04,380 --> 00:41:06,480
level metrics to understand.

770
00:41:07,820 --> 00:41:11,600
Because I identify 10 queries I
need to work on, but where to

771
00:41:11,600 --> 00:41:11,970
start?

772
00:41:11,970 --> 00:41:15,060
What is the most influential in
various terms?

773
00:41:15,060 --> 00:41:18,220
Like I need to produce a Statement
Stone to realize that.

774
00:41:18,420 --> 00:41:23,300
And I connect on by query ID and
query ID in auto_explain is supported

775
00:41:23,300 --> 00:41:25,580
only since version 16, I think.

776
00:41:25,980 --> 00:41:32,000
So this was a big missing piece
before 16.

777
00:41:32,080 --> 00:41:36,960
But once you're on 16 or 17 or
maybe already 18 beta 1, I don't

778
00:41:36,960 --> 00:41:41,860
know, then you connect by query
ID and you see, okay, based on

779
00:41:41,860 --> 00:41:46,280
log analysis, I see these plans
need my work, need attention

780
00:41:46,380 --> 00:41:50,680
or these queries, But I'm choosing
this one because I see it's

781
00:41:50,680 --> 00:41:53,980
like 25% of all time spent based
on pg_stat_statements.

782
00:41:54,520 --> 00:41:56,020
And this is a great way.

783
00:41:56,680 --> 00:42:02,820
So this brings us to the idea that
having logs is not enough.

784
00:42:04,900 --> 00:42:07,280
Michael: Yeah, well, they're kind
of serving slightly different

785
00:42:07,280 --> 00:42:08,480
purposes, aren't they?

786
00:42:08,480 --> 00:42:13,380
Because if a query is taking 25%
of total CPU, like if it's just

787
00:42:13,380 --> 00:42:17,060
taking 25% of our resources, It
would probably be relatively

788
00:42:17,300 --> 00:42:18,860
easy to reproduce that.

789
00:42:18,860 --> 00:42:24,720
If we grab the query, grab an example
that's, you know, relatively

790
00:42:24,960 --> 00:42:30,180
slow, and run EXPLAIN ANALYZE
with all the parameters on that,

791
00:42:30,180 --> 00:42:33,740
we'll probably get a slow query
plan.

792
00:42:34,700 --> 00:42:39,640
But where auto-explain comes into
its own is if queries are only

793
00:42:39,640 --> 00:42:44,160
every, like, if they're slow sometimes
but not always.

794
00:42:44,640 --> 00:42:45,700
If there's very specific.

795
00:42:47,640 --> 00:42:48,120
Exactly.

796
00:42:48,120 --> 00:42:50,780
It's cases like that that might
be difficult to diagnose.

797
00:42:51,260 --> 00:42:55,120
Maybe they happen at certain times
or with, you know, in cases

798
00:42:55,120 --> 00:42:57,480
that you're struggling to put your
finger on.

799
00:42:58,320 --> 00:43:02,060
Getting the plan at that moment
is extremely helpful.

800
00:43:02,680 --> 00:43:04,860
And it's not always possible.

801
00:43:07,580 --> 00:43:10,020
Nikolay: Do you think auto_explain
should be in a different log?

802
00:43:12,980 --> 00:43:13,940
Michael: I mean I still...

803
00:43:14,620 --> 00:43:15,300
Nikolay: Let me...

804
00:43:16,500 --> 00:43:20,300
I maybe already mentioned this
recipe in the past, I used it.

805
00:43:20,460 --> 00:43:23,980
So you have a tmpfs disk, RAM
disk.

806
00:43:24,840 --> 00:43:25,340
Yeah.

807
00:43:26,400 --> 00:43:29,240
Like gigabyte, we have memory,
so 1 gigabyte.

808
00:43:30,480 --> 00:43:31,820
And log goes there.

809
00:43:34,020 --> 00:43:38,660
And you quickly rotate it to ensure
it's not filled up.

810
00:43:39,160 --> 00:43:41,340
And send it somewhere to storage.

811
00:43:42,620 --> 00:43:46,700
And it would be great to have separate
log for such very bloaty

812
00:43:46,700 --> 00:43:48,340
things like auto_explain, right?

813
00:43:48,820 --> 00:43:51,580
Michael: Well, and that's the big
difference between auto_explain

814
00:43:51,580 --> 00:43:53,720
and just logging the query.

815
00:43:53,720 --> 00:43:58,700
Because even queries can get long,
but query plans can be ridiculously

816
00:43:59,100 --> 00:43:59,340
big.

817
00:43:59,340 --> 00:44:03,480
Like the largest one I think I've
seen was getting close to 100

818
00:44:03,480 --> 00:44:04,080
megabytes of

819
00:44:04,080 --> 00:44:04,540
Nikolay: text.

820
00:44:04,540 --> 00:44:05,140
Just one.

821
00:44:05,740 --> 00:44:06,980
Michael: Just one query plan.

822
00:44:06,980 --> 00:44:11,140
Okay, a lot of that was like, there
was a lot of like whitespace

823
00:44:11,260 --> 00:44:12,840
and a lot of repeated

824
00:44:12,900 --> 00:44:13,132
Nikolay: identifiers.

825
00:44:13,132 --> 00:44:16,000
It should be compressed, immediately
compressed and sent over

826
00:44:16,000 --> 00:44:16,440
network.

827
00:44:16,440 --> 00:44:18,220
But compressed by whom?

828
00:44:18,260 --> 00:44:19,400
Logging collector or?

829
00:44:19,400 --> 00:44:19,900
Yeah.

830
00:44:20,080 --> 00:44:21,300
That's an interesting question.

831
00:44:22,060 --> 00:44:24,820
Michael: So it is the big downside
in terms of volume.

832
00:44:25,080 --> 00:44:28,640
Nikolay: So we see so much power
we have right now for observability

833
00:44:28,820 --> 00:44:32,220
and query processing, query optimization,
but still it feels

834
00:44:32,220 --> 00:44:37,360
like in heavily loaded systems,
it's hard to have, like, it's

835
00:44:37,360 --> 00:44:41,100
not super hard, but there are opportunities
to improve.

836
00:44:41,140 --> 00:44:42,100
100% so.

837
00:44:42,700 --> 00:44:43,200
Michael: Yeah.

838
00:44:43,260 --> 00:44:47,320
And if you've got a bit of, if
you've not got issues right now

839
00:44:47,640 --> 00:44:51,240
it's great to set a bunch of these
and you can afford to set

840
00:44:51,340 --> 00:44:54,020
things like auto_explain up if
you've got overhead if you've

841
00:44:54,020 --> 00:44:58,740
got headroom you can set these
things up in advance and then

842
00:44:58,740 --> 00:45:03,280
change them if that if you start
to get into you know the 90%

843
00:45:03,380 --> 00:45:06,560
of CPU being used you can look
into things you can do to reduce

844
00:45:06,560 --> 00:45:07,060
those.

845
00:45:07,720 --> 00:45:12,040
But I really like a quote, I think
somebody was at an Oracle

846
00:45:12,040 --> 00:45:15,480
conference and somebody asked,
what's all the overhead of all

847
00:45:15,480 --> 00:45:18,400
of the performance analysis Oracle's
doing.

848
00:45:18,400 --> 00:45:18,560
Oh yeah, of

849
00:45:18,560 --> 00:45:19,500
Nikolay: course, Stanel Potter.

850
00:45:20,580 --> 00:45:21,300
Michael: Ah, nice.

851
00:45:22,200 --> 00:45:26,200
Nikolay: Yeah, he said something like, I can pay like 10% or

852
00:45:26,200 --> 00:45:31,720
something overhead, but save like 90% thanks to my optimization

853
00:45:31,780 --> 00:45:32,940
based on those information.

854
00:45:33,900 --> 00:45:36,180
Michael: Yes, I think it's along those lines.

855
00:45:36,180 --> 00:45:38,220
I think if you have this information.

856
00:45:38,800 --> 00:45:41,640
Nikolay: It's very exposition, you know, like, if you have this,

857
00:45:41,640 --> 00:45:45,520
but you're not going to use it, you're losing, right, additionally.

858
00:45:46,020 --> 00:45:51,340
But if you are definitely like, you are going to allocate your

859
00:45:51,340 --> 00:45:57,100
resources to optimize, ready to pay this penalty like or tax,

860
00:45:57,700 --> 00:46:01,500
but because I know I will optimize and I will be on a better

861
00:46:01,500 --> 00:46:02,860
spot after it.

862
00:46:03,260 --> 00:46:05,800
Michael: Yeah, exactly, it's not a liability, it's an asset at

863
00:46:05,800 --> 00:46:06,540
that point.

864
00:46:07,040 --> 00:46:12,160
Nikolay: And those 10% they will shrink, thanks to my optimization.

865
00:46:13,280 --> 00:46:18,620
Compared to initial latencies, they may become smaller, right?

866
00:46:20,060 --> 00:46:22,380
Michael: Maybe, if they're a percentage, I've got a feeling some

867
00:46:22,380 --> 00:46:23,420
of them are fixed.

868
00:46:24,140 --> 00:46:26,820
Nikolay: But if we are not sure that we will be able to optimize,

869
00:46:27,100 --> 00:46:31,560
I can understand that it can sound risky to pay this additional

870
00:46:31,560 --> 00:46:34,560
price without a reliable outcome.

871
00:46:35,140 --> 00:46:39,060
Who knows how successful we will be.

872
00:46:39,060 --> 00:46:43,620
But again, all of these things, except putting auto_explain to

873
00:46:43,620 --> 00:46:47,980
shared_preload_libraries, you can share them on fly.

874
00:46:47,980 --> 00:46:49,320
So it's so great.

875
00:46:49,600 --> 00:46:55,380
You can temporarily lower all thresholds during a week or 2 while

876
00:46:55,380 --> 00:46:58,500
you're optimizing, for example, and then get back to your normal

877
00:46:58,700 --> 00:47:02,000
and do it like every quarter, for example, like routine optimization.

878
00:47:04,300 --> 00:47:05,680
So cycles of optimization.

879
00:47:07,100 --> 00:47:11,340
Also if it's non-production or something, you can be more aggressive

880
00:47:11,480 --> 00:47:17,060
in going down with thresholds and having more, to see more, yeah.

881
00:47:18,340 --> 00:47:20,780
Michael: It feels like 1 last thing on auto_explain.

882
00:47:21,420 --> 00:47:25,380
The thing in the docs and quite a few experts will recommend

883
00:47:25,380 --> 00:47:27,940
is turning off the timing parameter.

884
00:47:28,080 --> 00:47:34,060
So you collect the EXPLAIN ANALYZE information but without per

885
00:47:34,060 --> 00:47:36,420
node operation, like per node timings.

886
00:47:37,800 --> 00:47:40,460
It does reduce the overhead, of course.

887
00:47:41,040 --> 00:47:46,560
But it also reduces the, like, fidelity of those plans and it

888
00:47:46,560 --> 00:47:47,320
is quite useful.

889
00:47:47,320 --> 00:47:49,200
Nikolay: Well, yeah, I don't agree with you.

890
00:47:49,540 --> 00:47:56,500
For me, for me, most valuable is structure and buffers, as usual.

891
00:47:57,560 --> 00:48:01,960
Michael: Of course, I get that they're more useful, But I think

892
00:48:01,980 --> 00:48:05,420
quite often in a lot of environments collecting the timing information

893
00:48:05,420 --> 00:48:06,700
is also worth it.

894
00:48:07,360 --> 00:48:08,320
Nikolay: Well, yeah.

895
00:48:10,080 --> 00:48:14,020
You had a post long ago about this, but we also had a whole episode

896
00:48:14,240 --> 00:48:14,860
about auto_explain.

897
00:48:14,860 --> 00:48:15,900
Do you remember?

898
00:48:16,800 --> 00:48:17,220
Michael: Yeah.

899
00:48:17,220 --> 00:48:19,080
I'll link up the relevant episodes.

900
00:48:19,140 --> 00:48:19,340
It's a

901
00:48:19,340 --> 00:48:19,980
Nikolay: big topic.

902
00:48:20,020 --> 00:48:23,820
AutoExpand has many settings to tune, so it's a big topic.

903
00:48:24,240 --> 00:48:26,620
Good, I think maybe that's it, right?

904
00:48:27,040 --> 00:48:27,840
Michael: Yes, definitely.

905
00:48:28,080 --> 00:48:29,160
Thanks so much, Nikolay.

906
00:48:29,280 --> 00:48:30,260
Catch you next week.

907
00:48:31,310 --> 00:48:32,060
Nikolay: Yeah, Bye-bye.