1
00:00:00,060 --> 00:00:02,360
Michael: Hello and welcome to
Postgres.FM, a weekly show about

2
00:00:02,360 --> 00:00:03,480
all things PostgreSQL.

3
00:00:03,480 --> 00:00:06,020
I am Michael, founder of pgMustard,
and as usual, I'm joined

4
00:00:06,020 --> 00:00:07,500
by Nikolay, founder of Postgres.AI.

5
00:00:07,680 --> 00:00:08,540
Hello, Nikolay.

6
00:00:09,020 --> 00:00:09,820
Nikolay: Hi, Michael.

7
00:00:10,320 --> 00:00:13,380
Michael: Today is no normal episode
though, because we are delighted

8
00:00:13,380 --> 00:00:16,420
to be joined by Melanie Plageman,
who is a Database Internals

9
00:00:16,440 --> 00:00:19,840
Engineer working at Microsoft and
major contributor and committer

10
00:00:19,960 --> 00:00:20,720
to PostgreSQL.

11
00:00:21,140 --> 00:00:23,080
It's a real honor to have you on
the show, Melanie.

12
00:00:23,080 --> 00:00:23,580
Welcome.

13
00:00:23,800 --> 00:00:24,320
Melanie: Thank you.

14
00:00:24,320 --> 00:00:26,020
I'm excited to be here.

15
00:00:26,360 --> 00:00:27,540
Michael: We're excited to have
you.

16
00:00:27,540 --> 00:00:31,080
There are so many things we could
have chosen as a topic today,

17
00:00:31,280 --> 00:00:34,100
But off the back of a couple of
talks you've given at Postgres

18
00:00:34,120 --> 00:00:37,260
conferences, we've opted to go
for the topic of getting started

19
00:00:37,260 --> 00:00:38,060
with benchmarking.

20
00:00:38,680 --> 00:00:40,760
Would you mind giving us a little
background about why this is

21
00:00:40,760 --> 00:00:43,160
a topic you're particularly interested
in?

22
00:00:43,260 --> 00:00:43,940
Melanie: Yeah, sure.

23
00:00:43,940 --> 00:00:48,120
So since I started working at Microsoft
a few years ago, a lot

24
00:00:48,120 --> 00:00:50,660
of the work that our team does
is performance related.

25
00:00:51,000 --> 00:00:55,400
So I would say as a Postgres engineer,
even for features that

26
00:00:55,400 --> 00:00:58,580
are not performance related, you
have to make sure that you understand

27
00:00:58,580 --> 00:00:59,780
their performance impact.

28
00:01:00,420 --> 00:01:05,380
But we tend to work on things like
AIO and other features where

29
00:01:05,380 --> 00:01:08,300
you wouldn't do them if they didn't
have a positive performance

30
00:01:08,300 --> 00:01:08,740
impact.

31
00:01:08,740 --> 00:01:11,820
So assessing that is a huge part
of my job.

32
00:01:11,820 --> 00:01:16,220
And it's an area where every little
detail matters.

33
00:01:16,640 --> 00:01:19,340
And the things that don't matter
are kind of counterintuitive.

34
00:01:20,240 --> 00:01:25,440
And it's just, it's kind of an
art and it's so hard to learn.

35
00:01:25,440 --> 00:01:29,840
And it's like a dark art, you know,
and it's, I think I've been

36
00:01:30,480 --> 00:01:33,680
really blessed in that I've gotten
to talk to and learn from

37
00:01:33,680 --> 00:01:37,820
people that know how to do it,
like Andres Freund and other people

38
00:01:37,820 --> 00:01:42,660
at Microsoft and in the community,
but not everyone gets to learn

39
00:01:42,660 --> 00:01:46,160
from people that have 15 years
or 20 years of experience.

40
00:01:46,280 --> 00:01:50,860
So I like talking about it and
trying to see if I can help people

41
00:01:50,860 --> 00:01:54,300
get started and demystify what
I can about it.

42
00:01:54,720 --> 00:01:56,100
Nikolay: Yeah, that's a good point.

43
00:01:56,520 --> 00:01:58,520
So you think it can be improved,
right?

44
00:01:58,520 --> 00:02:02,020
And people can understand benchmarks
better, right?

45
00:02:02,660 --> 00:02:06,600
What are the tools, do you think,
for that to find solutions

46
00:02:06,620 --> 00:02:09,380
better and understand bottlenecks
faster and so on?

47
00:02:09,640 --> 00:02:09,880
Yes.

48
00:02:09,880 --> 00:02:13,140
Misleading results we have all
the time, like we think we found

49
00:02:13,140 --> 00:02:15,880
something, but it's wrong, and
then you like 1 day later, you

50
00:02:15,880 --> 00:02:19,460
realize that it's completely opposite
sometimes, right?

51
00:02:19,460 --> 00:02:22,040
So, so you think it can be improved,
right?

52
00:02:22,720 --> 00:02:23,220
Melanie: Right.

53
00:02:23,680 --> 00:02:27,340
So I think one of the things that
users have going for them is

54
00:02:27,340 --> 00:02:31,680
that they have a baseline understanding
of how you should configure

55
00:02:31,700 --> 00:02:32,460
your database.

56
00:02:32,760 --> 00:02:37,660
And some of the mistakes that you
maybe, that I made as a beginning

57
00:02:37,660 --> 00:02:40,300
with performance work, you might
not make.

58
00:02:40,320 --> 00:02:45,060
So I remember like the first set
of benchmarks that I did, I

59
00:02:45,060 --> 00:02:47,060
didn't configure shared buffers
at all.

60
00:02:47,540 --> 00:02:51,020
This was years ago, so I give myself
a little bit of an out.

61
00:02:51,020 --> 00:02:55,020
But basically, I was looking at
it and I was saying, this improvement,

62
00:02:55,120 --> 00:02:58,260
you know, that I, this is supposed
to be an improvement and it

63
00:02:58,260 --> 00:03:00,360
looks like it's actually hurting
performance.

64
00:03:00,360 --> 00:03:03,240
And I was showing Andres and he
was like, well, tell me how you

65
00:03:03,240 --> 00:03:05,920
configured it and what, you know,
what kind of, I didn't change

66
00:03:05,920 --> 00:03:06,600
any configuration.

67
00:03:06,640 --> 00:03:09,220
And he was like, okay, we'll just
throw all of this away.

68
00:03:09,340 --> 00:03:14,320
And so some of the basic stuff
around benchmarking, I think that's

69
00:03:14,640 --> 00:03:18,040
something that developers who maybe
don't have as much user experience

70
00:03:18,040 --> 00:03:20,300
have to learn, but users have that.

71
00:03:20,320 --> 00:03:23,240
I think one of the things that I
see happen more than you would

72
00:03:23,240 --> 00:03:28,360
think is that there's some unexplained
performance impact or,

73
00:03:28,780 --> 00:03:31,400
like a lot of times you're, you
develop a change, you want to

74
00:03:31,400 --> 00:03:35,140
see how does it improve performance,
and when it doesn't, or

75
00:03:35,140 --> 00:03:38,240
it has an effect that you can't
explain, you have to start investigating.

76
00:03:40,080 --> 00:03:43,000
One of the things that people almost
never do, at least performance

77
00:03:43,040 --> 00:03:46,680
engineers, they don't look at the
logs, and it's like maybe because

78
00:03:46,680 --> 00:03:48,060
it's a development branch, right?

79
00:03:48,060 --> 00:03:51,960
Like maybe you actually had a bug
and it's crashing or, you know,

80
00:03:51,960 --> 00:03:54,840
in the logs, there's something
that's actually, and that's what's

81
00:03:54,840 --> 00:03:55,520
slowing it down.

82
00:03:55,520 --> 00:03:57,880
And it's such a basic thing that
you don't think about.

83
00:03:57,880 --> 00:04:01,900
You're like looking at IO stat
and looking at these more advanced

84
00:04:02,080 --> 00:04:05,140
tools when we really just need
to look at the log.

85
00:04:06,140 --> 00:04:09,400
Nikolay: Or it was fast because
it didn't do any work, right?

86
00:04:09,520 --> 00:04:10,020
Melanie: Yeah.

87
00:04:11,240 --> 00:04:13,300
Nikolay: And looking at logs, you
can notice it.

88
00:04:13,960 --> 00:04:13,980
Yeah.

89
00:04:13,980 --> 00:04:14,700
Good point.

90
00:04:15,540 --> 00:04:20,780
Sometimes, even some companies
we had, I won't name it, but sometimes

91
00:04:21,740 --> 00:04:25,320
some benchmarks are published and
they claim something, and if

92
00:04:25,320 --> 00:04:29,140
you have experience, 10 plus years,
you quickly have some ideas

93
00:04:29,140 --> 00:04:30,700
what's wrong with those benchmarks.

94
00:04:30,700 --> 00:04:34,400
For example, full-text search in
Postgres is bad.

95
00:04:34,680 --> 00:04:37,560
It shows capital O of N.

96
00:04:38,000 --> 00:04:39,020
You think, how come?

97
00:04:39,020 --> 00:04:39,840
It's not possible.

98
00:04:39,840 --> 00:04:42,840
Like, you quickly realize it's
not possible.

99
00:04:42,840 --> 00:04:45,720
And you look at this and realize,
oh, yeah, they just didn't

100
00:04:45,720 --> 00:04:47,540
build GIN index at all.

101
00:04:47,920 --> 00:04:48,420
Right?

102
00:04:48,540 --> 00:04:52,420
But if you don't have experience,
it's easy to make errors, like

103
00:04:52,420 --> 00:04:54,640
right, to have errors, mistakes.

104
00:04:55,120 --> 00:04:59,220
And my idea, like it would be good
if something would help people

105
00:04:59,220 --> 00:05:04,200
to brainstorm what's wrong or what
else to look at as early as

106
00:05:04,200 --> 00:05:04,540
possible.

107
00:05:04,540 --> 00:05:07,160
So we need some brainstorming mechanism,
right?

108
00:05:07,260 --> 00:05:08,160
What do you think?

109
00:05:08,760 --> 00:05:12,620
Melanie: Yeah, almost like a checklist
or that kind of thing.

110
00:05:12,720 --> 00:05:17,140
I think that's one of the reasons
why when I started out with development,

111
00:05:17,260 --> 00:05:20,860
I thought, you know, with Postgres,
there's not enough basic

112
00:05:20,860 --> 00:05:24,220
resources like for performance
work that give you a checklist.

113
00:05:24,620 --> 00:05:29,020
But then I think I realized over
time, like I sort of moved towards

114
00:05:29,020 --> 00:05:31,960
the, you almost need that one-on-one
interaction with someone

115
00:05:31,960 --> 00:05:37,320
who has more experience because
it's just so individual and so

116
00:05:37,360 --> 00:05:40,820
it's just so hard to give general
advice to people.

117
00:05:41,060 --> 00:05:44,820
So I think that there are basic
questions you can ask yourself.

118
00:05:45,020 --> 00:05:48,080
There's a big difference between
I think a lot of users when

119
00:05:48,080 --> 00:05:51,360
they do benchmarking, I think a
common case is they're going

120
00:05:51,360 --> 00:05:53,820
to install, they're thinking about
upgrading.

121
00:05:54,140 --> 00:05:58,040
And so they're like, let me just
see if it's slower or faster,

122
00:05:58,040 --> 00:05:58,940
whatever, right?

123
00:05:59,440 --> 00:06:06,060
In that case, using pgbench for
that is not necessarily going

124
00:06:06,060 --> 00:06:07,120
to help you, right?

125
00:06:07,120 --> 00:06:11,200
Because first of all, benchmarking
what you actually want to

126
00:06:11,200 --> 00:06:15,200
benchmark with pgbench or any
benchmarking tool is, is hard.

127
00:06:15,560 --> 00:06:18,960
And second of all, what if that's
not like your workload at all?

128
00:06:18,960 --> 00:06:24,360
You know, so let's say that in
Postgres there were changes to

129
00:06:24,400 --> 00:06:29,540
the way that vacuum works and you
run a benchmark for less than

130
00:06:29,540 --> 00:06:30,320
a few minutes.

131
00:06:30,320 --> 00:06:32,940
Well, like, did you even end up
having anything be vacuumed?

132
00:06:33,000 --> 00:06:36,100
You almost have to think about
what do I actually care about?

133
00:06:36,480 --> 00:06:41,300
So if you're a user who's worried
about upgrading, like, I guess

134
00:06:41,520 --> 00:06:43,640
you guys would know better than
me, but you'll probably have

135
00:06:43,640 --> 00:06:48,460
options like running your maybe
non in not production, you can

136
00:06:48,640 --> 00:06:52,240
run your real workload and on a
newer version and see how it

137
00:06:52,240 --> 00:06:52,480
is.

138
00:06:52,480 --> 00:06:53,960
And like, that's the ideal scenario.

139
00:06:53,960 --> 00:06:57,400
But if that's not an option for
you, then I think what you have

140
00:06:57,400 --> 00:07:01,420
to do is really think about what
are the highest risk areas for

141
00:07:01,420 --> 00:07:01,820
me?

142
00:07:01,820 --> 00:07:07,120
Like, do I have one critical report
I have to run?

143
00:07:07,120 --> 00:07:11,480
Or, you know, I need to make sure
that I'm able to reach this

144
00:07:11,480 --> 00:07:13,980
TPS at this time of day, or whatever
it is.

145
00:07:13,980 --> 00:07:16,400
And then you have to have a way
to reproduce that.

146
00:07:16,400 --> 00:07:19,080
Like you almost have to have a
minimal repro for the performance

147
00:07:19,080 --> 00:07:22,540
question that you're asking and
be able to test it.

148
00:07:22,540 --> 00:07:26,380
And I would say even then, like,
I don't know if you're gonna

149
00:07:26,380 --> 00:07:30,780
get a perfect answer outside of
actual production environment.

150
00:07:31,580 --> 00:07:34,740
Like the people, the use case that
users have for benchmarking,

151
00:07:34,960 --> 00:07:38,680
I haven't heard of one where
I think it's going to work out

152
00:07:38,680 --> 00:07:39,000
for them.

153
00:07:39,000 --> 00:07:42,680
But I don't know, maybe you can
tell me what people want to use

154
00:07:42,680 --> 00:07:43,040
Nikolay: it for.

155
00:07:43,040 --> 00:07:48,260
Let me shamelessly ask you your
opinion about the approach I

156
00:07:48,260 --> 00:07:53,080
ended up having for upgrades with
our customers, big and small,

157
00:07:53,080 --> 00:07:53,900
doesn't matter.

158
00:07:54,120 --> 00:07:58,520
So I was a big fan of replaying
workload like 5 years ago, but

159
00:07:58,520 --> 00:07:59,880
we don't have tooling for it.

160
00:07:59,880 --> 00:08:03,160
And mirroring, like PgCat, I think,
has mirroring, but it's hard.

161
00:08:03,160 --> 00:08:06,820
It should be in between clients
and server and it's very critical

162
00:08:06,820 --> 00:08:07,320
infrastructure.

163
00:08:07,420 --> 00:08:11,200
Usually pgBouncer is there and
you cannot just quickly replace

164
00:08:11,200 --> 00:08:14,940
it with PgCat and so on and overhead
and so on.

165
00:08:14,960 --> 00:08:17,620
And mirroring also is tricky thing
actually.

166
00:08:17,960 --> 00:08:22,060
So what I ended up doing, I split
it into 2 things.

167
00:08:22,080 --> 00:08:24,780
First thing is load testing generally
with pgbench.

168
00:08:24,920 --> 00:08:28,780
And usually, we have already done
by hackers and so on, like

169
00:08:28,780 --> 00:08:30,480
performance farms or something.

170
00:08:30,480 --> 00:08:33,840
And we can check it additionally
with some synthetic workload,

171
00:08:34,440 --> 00:08:36,980
pgbench, not pgbench, sysbench,
doesn't matter.

172
00:08:37,080 --> 00:08:38,040
So this is one thing.

173
00:08:38,040 --> 00:08:41,520
And usually we rely on community
results or we can extend them

174
00:08:41,520 --> 00:08:42,160
as well.

175
00:08:42,260 --> 00:08:47,520
But as for our workload for our
project, Instead of replaying

176
00:08:47,520 --> 00:08:51,900
the whole, we think, okay, we forget
about log manager, buffer

177
00:08:51,900 --> 00:08:55,200
pool, and so on, all those components,
and we care only about

178
00:08:55,200 --> 00:08:55,800
plan flips.

179
00:08:55,800 --> 00:09:00,100
We want just to check the planner
behavior, and we just grab

180
00:09:00,160 --> 00:09:03,940
query examples, like 100 of them,
which are most critical in

181
00:09:03,940 --> 00:09:07,400
terms of total time or calls or
something from pg_stat_statements.

182
00:09:07,880 --> 00:09:11,980
Sometimes we collect multiple examples
for one normalized query

183
00:09:11,980 --> 00:09:13,240
in pg_stat_statements.

184
00:09:13,620 --> 00:09:17,420
And then this is our testing set.

185
00:09:17,720 --> 00:09:20,820
And then we don't care about machine
we use, actually.

186
00:09:21,040 --> 00:09:22,440
It can be smaller machine.

187
00:09:22,740 --> 00:09:28,020
We just have exact clone of production
database to have the same

188
00:09:28,020 --> 00:09:29,180
data and pg_statistic.

189
00:09:29,260 --> 00:09:33,540
And then we also adjust settings,
all planner settings and work_mem,

190
00:09:33,900 --> 00:09:37,220
which is not planner setting, but
also affects planner behavior.

191
00:09:37,660 --> 00:09:41,420
And then we just reproduce plans,
upgrade, reproduce plans again,

192
00:09:41,800 --> 00:09:44,360
and compare and see if some plans
changed.

193
00:09:45,040 --> 00:09:48,900
And also we use BUFFERS, of course,
and we see like costs and

194
00:09:48,900 --> 00:09:50,460
BUFFERS we usually look at.

195
00:09:51,420 --> 00:09:53,280
We don't care about timing because
it can be smaller machine,

196
00:09:53,320 --> 00:09:54,660
cache type can be different.

197
00:09:55,380 --> 00:09:58,380
And if some plan flip occurs, we
quickly find it.

198
00:09:58,520 --> 00:10:03,080
And it can happen like in small
machine, like in shared environment,

199
00:10:03,080 --> 00:10:03,580
basically.

200
00:10:03,740 --> 00:10:07,920
It's not, pgbench is not needed
for plan flip check.

201
00:10:08,480 --> 00:10:08,980
Melanie: Right.

202
00:10:09,200 --> 00:10:10,060
Nikolay: And that's it.

203
00:10:10,240 --> 00:10:13,260
Melanie: Yeah, I think if I understand
correctly, at some point

204
00:10:13,260 --> 00:10:16,640
you use pgbench, and then you realize
that it wasn't actually

205
00:10:16,640 --> 00:10:19,640
going to help you with this particular
case.

206
00:10:19,640 --> 00:10:22,940
Nikolay: It's super hard to reproduce
actual workload.

207
00:10:23,480 --> 00:10:25,320
Melanie: Yeah, so I support that.

208
00:10:25,320 --> 00:10:29,140
I think really, honestly, what
you're doing, which is looking

209
00:10:29,140 --> 00:10:33,440
for plan flips and starting with
pg_stat_statements is probably

210
00:10:33,440 --> 00:10:36,960
the best thing that you could do
to try to make sure that you're

211
00:10:36,960 --> 00:10:42,720
going to, when you upgrade that
it's going to work out for you.

212
00:10:42,720 --> 00:10:45,640
And because I think that's one of
the only things that you can

213
00:10:45,640 --> 00:10:49,240
realistically reproduce because
like the planner doesn't take

214
00:10:49,240 --> 00:10:51,980
into account other things that
are happening.

215
00:10:51,980 --> 00:10:54,620
Each plan is planned in isolation
with the statistics and the

216
00:10:54,620 --> 00:10:55,840
data that you're talking about.

217
00:10:55,840 --> 00:10:58,260
It's not looking at overall system
load.

218
00:10:58,260 --> 00:11:00,480
I mean, maybe we should do things
like that, but planner doesn't

219
00:11:00,480 --> 00:11:01,500
do that right now.

220
00:11:01,640 --> 00:11:04,900
So it's actually possible to do
that.

221
00:11:04,900 --> 00:11:08,240
So I think that's a really good
sort of precautionary thing that

222
00:11:08,240 --> 00:11:12,040
you can do to see if the next version
of Postgres is going to

223
00:11:12,040 --> 00:11:12,980
cause you problems.

224
00:11:13,460 --> 00:11:15,480
So yes, I sign off on that.

225
00:11:15,480 --> 00:11:17,880
But that makes sense as a strategy.

226
00:11:18,180 --> 00:11:21,420
Nikolay: Before that, we were using
pgbench and we took most

227
00:11:21,420 --> 00:11:25,440
frequent and most time-consuming
queries from pg_stat_statements

228
00:11:25,440 --> 00:11:27,820
and we called it crafted workload.

229
00:11:27,980 --> 00:11:32,440
We've tried to find some parameters
or examples of queries and

230
00:11:32,440 --> 00:11:37,400
put them to pgbench using hyphen
F and then add sign to balance

231
00:11:37,480 --> 00:11:41,200
somehow like to have some balanced
workload and then ignore some

232
00:11:41,200 --> 00:11:44,180
errors if they happen like foreign
key violation or something.

233
00:11:45,040 --> 00:11:46,660
I cannot say it worked well.

234
00:11:46,960 --> 00:11:51,340
It helped, but then I just realized,
why do we care about buffer

235
00:11:51,340 --> 00:11:51,900
pool and so on?

236
00:11:51,900 --> 00:11:53,260
Let's just split this.

237
00:11:53,440 --> 00:11:57,080
Melanie: So you were trying to,
you were doing that before upgrading

238
00:11:57,080 --> 00:11:58,720
to see if you would get the same.

239
00:11:58,740 --> 00:12:01,920
Nikolay: I stopped doing this because
It's hard, actually, time

240
00:12:01,920 --> 00:12:02,420
consuming.

241
00:12:02,660 --> 00:12:04,920
And if we care about plan flips,
that's it.

242
00:12:04,920 --> 00:12:05,780
We have a recipe.

243
00:12:05,900 --> 00:12:09,920
If we care about improvements in
some components of Postgres,

244
00:12:09,920 --> 00:12:14,360
we can research them using some
simple workloads like pgbench.

245
00:12:14,480 --> 00:12:15,260
That's it.

246
00:12:16,100 --> 00:12:16,400
Melanie: Right.

247
00:12:16,400 --> 00:12:16,900
Yeah.

248
00:12:17,640 --> 00:12:22,760
You really can't use pgbench to
replicate real workloads because,

249
00:12:22,900 --> 00:12:27,540
so for example, if I want to analyze
a patch that I'm doing and

250
00:12:27,540 --> 00:12:31,940
I want to replicate some scenario,
I might run multiple instances

251
00:12:31,960 --> 00:12:33,340
of pgbench at the same time.

252
00:12:33,340 --> 00:12:38,980
So I run 1 that's doing a SELECT
query, and another 1 that's

253
00:12:38,980 --> 00:12:41,040
doing a transactional workload.

254
00:12:41,040 --> 00:12:44,160
And you can combine different pgbenches,
and you can sequence

255
00:12:44,160 --> 00:12:45,420
them, and that kind of thing.

256
00:12:45,420 --> 00:12:50,460
But ultimately, each pgbench is
going to do the same thing, no

257
00:12:50,460 --> 00:12:52,420
matter what kind of custom script
you provide.

258
00:12:52,420 --> 00:12:54,960
So you can there's different variables
and you can do things

259
00:12:54,960 --> 00:12:59,240
like have variables in the script
and then interpolate random

260
00:12:59,240 --> 00:13:00,720
numbers and things like that.

261
00:13:00,720 --> 00:13:05,180
But in a real system, you'll have
some kind of work happening

262
00:13:05,180 --> 00:13:07,740
and then another kind of work happening
that's different work

263
00:13:07,740 --> 00:13:12,740
and like interspersed or at unpredictable
intervals and with

264
00:13:12,740 --> 00:13:17,640
pgbench in the, you know, sort
of during a pgbench run, all

265
00:13:17,640 --> 00:13:21,460
of the workers are going to be
doing the same thing, you know,

266
00:13:21,460 --> 00:13:23,220
whatever's in the pgbench script.

267
00:13:23,540 --> 00:13:27,900
So you can't get, it's very, very
hard to replicate realistic

268
00:13:28,260 --> 00:13:29,720
scenarios with it.

269
00:13:29,860 --> 00:13:32,220
And there's lots of creative things
you can do.

270
00:13:32,220 --> 00:13:36,340
Like you can have some if statement
logic where you're like,

271
00:13:36,340 --> 00:13:39,360
okay, if this worker is this worker
number, then do this other

272
00:13:39,360 --> 00:13:39,780
thing.

273
00:13:39,780 --> 00:13:44,340
And like, but at that point, I
think it's not useful if you're

274
00:13:44,340 --> 00:13:46,880
trying to understand what the performance
impact is going to

275
00:13:46,880 --> 00:13:49,780
be for your real world workload.

276
00:13:50,220 --> 00:13:55,020
Like I can see users maybe using
it to figure out, to understand

277
00:13:55,380 --> 00:13:57,940
a particular quirk or something
like that.

278
00:13:58,320 --> 00:14:03,520
Or maybe if you're thinking about
changing your hardware, using

279
00:14:03,520 --> 00:14:07,580
a different SKU from the same vendor
or something like that,

280
00:14:07,600 --> 00:14:11,300
you can try to understand how it
might affect general Postgres

281
00:14:11,320 --> 00:14:11,820
workloads.

282
00:14:12,340 --> 00:14:13,580
And then it could be helpful.

283
00:14:13,580 --> 00:14:16,220
But pgbench is like mainly a developer
tool.

284
00:14:16,280 --> 00:14:17,900
And I mean, that's what it's marketed
at.

285
00:14:17,900 --> 00:14:21,000
It's not really marketed as something,
not that it's marketed,

286
00:14:21,000 --> 00:14:24,960
but it's not, I don't think it's
represented to anyone as a tool for

287
00:14:24,960 --> 00:14:28,100
users to understand Postgres performance
of their workloads.

288
00:14:28,640 --> 00:14:31,960
But I mean, all the other benchmarks
that are out there, I think

289
00:14:31,960 --> 00:14:34,140
pgbench kind of gets lumped in
with them.

290
00:14:34,440 --> 00:14:40,200
And it's not really useful for
any of the things other than development.

291
00:14:40,200 --> 00:14:43,760
In my opinion, it's very hard to
make it useful for like comparing

292
00:14:44,120 --> 00:14:47,320
whatever Postgres to MySQL or that
kind of, I mean, cause you

293
00:14:47,320 --> 00:14:47,720
can't use

294
00:14:47,720 --> 00:14:47,980
Nikolay: them.

295
00:14:47,980 --> 00:14:51,000
Different hardware options, for
example, it can be useful.

296
00:14:51,020 --> 00:14:53,000
Melanie: Yeah, you can use it for
that for sure.

297
00:14:53,100 --> 00:14:55,840
Yeah, but it's definitely serves
a different purpose than the

298
00:14:55,840 --> 00:14:59,880
TPC benchmarks, for example, which
allow you to compare across,

299
00:15:00,400 --> 00:15:01,280
you know, databases.

300
00:15:02,120 --> 00:15:05,640
And then of course we also have
a community of people that are

301
00:15:05,640 --> 00:15:08,800
doing performance testing from
version to version of Postgres

302
00:15:09,240 --> 00:15:12,980
that's different than just testing
1 patch and saying, what is

303
00:15:12,980 --> 00:15:13,740
this patch doing?

304
00:15:13,740 --> 00:15:17,220
They're like sort of looking for
broad themes and regressions

305
00:15:17,360 --> 00:15:18,300
across versions.

306
00:15:18,420 --> 00:15:23,100
And you can use pgbench for that,
but ultimately, I would say

307
00:15:23,360 --> 00:15:27,900
it's easier or more meaningful
to do something like run the TPC

308
00:15:27,960 --> 00:15:32,320
benchmarks when you're looking
to see if Postgres has a regression

309
00:15:32,320 --> 00:15:36,720
from 1 version to another for the
purpose of community work.

310
00:15:37,120 --> 00:15:39,320
Michael: You mentioned we've got
a community of people doing

311
00:15:39,320 --> 00:15:39,720
that.

312
00:15:39,720 --> 00:15:41,900
I've seen a couple, but I don't
know of many.

313
00:15:42,260 --> 00:15:44,880
Do you have any names or things
I could look into?

314
00:15:45,560 --> 00:15:50,500
Melanie: So I think the most, the
benchmarker that I follow closely

315
00:15:50,540 --> 00:15:52,620
and that many people follow is
Mark Callaghan.

316
00:15:52,860 --> 00:15:53,480
Michael: Yeah, great.

317
00:15:53,480 --> 00:15:54,860
Melanie: Yeah, so he's great.

318
00:15:54,860 --> 00:15:57,980
And he actually came to PGConf.dev
this year and I got to meet

319
00:15:57,980 --> 00:15:58,140
him.

320
00:15:58,140 --> 00:16:01,560
He's kind of like one of my, I was
starstruck when I met him people,

321
00:16:01,560 --> 00:16:05,040
you know, and he's like super nice,
you know, in that.

322
00:16:05,060 --> 00:16:07,480
Nikolay: I like his blog's name,
Small Datum.

323
00:16:07,720 --> 00:16:08,920
Melanie: Yeah, yeah.

324
00:16:09,620 --> 00:16:13,660
And he's great because he has,
I mean, he's an engineer who has

325
00:16:13,660 --> 00:16:17,080
a ton of experience in doing performance
work for a long time,

326
00:16:17,080 --> 00:16:19,700
you know, in MySQL mainly, the
MySQL community.

327
00:16:20,460 --> 00:16:24,860
And then with benchmarking, describing
your methodology is so

328
00:16:24,860 --> 00:16:25,360
important.

329
00:16:25,720 --> 00:16:28,520
So he does that in great detail.

330
00:16:29,040 --> 00:16:32,440
And because he's active and he's
doing these benchmarks like

331
00:16:32,440 --> 00:16:33,340
all the time.

332
00:16:33,920 --> 00:16:38,460
He's providing a lot of really
useful information for the community

333
00:16:38,620 --> 00:16:42,580
around, you know, how he does version
to version of Postgres,

334
00:16:42,660 --> 00:16:45,480
different types of benchmarks,
different on different sized machines

335
00:16:45,480 --> 00:16:46,660
and that kind of thing.

336
00:16:46,920 --> 00:16:47,720
So he's great.

337
00:16:47,720 --> 00:16:50,800
And if you want to learn more about
benchmarking, I would say

338
00:16:50,800 --> 00:16:52,700
his blog posts are fairly advanced.

339
00:16:52,800 --> 00:16:55,440
So like, if you're getting started,
it might not be the best

340
00:16:55,440 --> 00:16:55,940
resource.

341
00:16:56,200 --> 00:16:56,820
I don't know.

342
00:16:56,820 --> 00:16:59,860
But I still am kind of like, okay,
let me read through this.

343
00:16:59,860 --> 00:17:02,480
And you know, he's getting down
to looking at how does your CPU

344
00:17:02,480 --> 00:17:05,380
frequency governor affect your
benchmark results, right?

345
00:17:05,380 --> 00:17:08,920
So it's definitely at the point
of, it's definitely past configuring

346
00:17:09,060 --> 00:17:10,580
shared buffers for sure.

347
00:17:10,740 --> 00:17:11,640
So he's great.

348
00:17:11,640 --> 00:17:16,520
And then I would say within the
Postgres community, I mean, there's

349
00:17:16,560 --> 00:17:22,700
developers like Tomas Vondra and
of course Andres Freund who do benchmarks

350
00:17:22,700 --> 00:17:25,800
when they're developing and then
usually they'll look at the

351
00:17:25,800 --> 00:17:27,680
holistic, you know, release.

352
00:17:28,260 --> 00:17:31,720
But most of those results are just
getting posted on hackers

353
00:17:31,720 --> 00:17:34,480
and they're not sort of like, because
it takes so much time to

354
00:17:34,480 --> 00:17:37,600
do what Mark does and sort of describe
your methodology, like

355
00:17:37,600 --> 00:17:41,380
double check every result, investigate
every discrepancy, and

356
00:17:41,380 --> 00:17:44,060
then publish it and sort of be
accountable for it because people

357
00:17:44,060 --> 00:17:48,340
will come after you, they'll be
like, no, that is not a regression

358
00:17:48,400 --> 00:17:49,000
or whatever.

359
00:17:49,000 --> 00:17:52,400
You have to be ready to, to like
defend your work.

360
00:17:52,540 --> 00:17:55,580
And it's, it's a full, it's almost
like a full-time job in itself.

361
00:17:55,580 --> 00:17:58,100
So a lot of the benchmarks that
get done within the community

362
00:17:58,100 --> 00:17:58,920
are people

363
00:17:59,340 --> 00:18:02,120
doing them and then just saying,
hey, I found this.

364
00:18:02,120 --> 00:18:03,740
Can anyone else look into it?

365
00:18:04,540 --> 00:18:09,640
So there's also Alexander Lakhin
runs GPCDS and I think GPCH

366
00:18:09,960 --> 00:18:13,680
also every release and does a lot
of investigation.

367
00:18:13,820 --> 00:18:17,440
And he's great because he does
so much investigation of bugs

368
00:18:17,440 --> 00:18:21,300
and like reporting of bugs and
reproducing very hard to reproduce

369
00:18:21,420 --> 00:18:21,920
bugs.

370
00:18:22,060 --> 00:18:26,280
So if he finds some sort of regression,
performance regression,

371
00:18:26,280 --> 00:18:29,580
he'll also bisect it down to the
commit that actually caused

372
00:18:29,580 --> 00:18:29,720
it.

373
00:18:29,720 --> 00:18:33,640
And as a developer, it's nice
to have someone doing that

374
00:18:33,640 --> 00:18:34,820
level of analysis.

375
00:18:35,540 --> 00:18:38,540
And so again, like that's not,
he's not necessarily publishing

376
00:18:38,680 --> 00:18:41,100
a methodology post and all of
that.

377
00:18:41,180 --> 00:18:44,440
Then there's some people
that maintain the kits for,

378
00:18:44,440 --> 00:18:48,380
because if you want to implement
the TPC benchmarks, it's not like

379
00:18:48,380 --> 00:18:50,020
you just run it.

380
00:18:50,020 --> 00:18:52,680
Like you have, it's more
involved than that.

381
00:18:52,680 --> 00:18:57,180
There are people that maintain
different kits to like the

382
00:18:57,180 --> 00:18:58,380
bash scripts and stuff.

383
00:18:58,440 --> 00:19:02,220
And I think Mark Wong just did
a new one for TPC-DS.

384
00:19:02,500 --> 00:19:04,840
So he has a lot of database internals
experience.

385
00:19:04,840 --> 00:19:08,760
And then he was involved with the
TPC council for a while.

386
00:19:08,760 --> 00:19:13,440
And so he kind of is an expert
on just the TPC benchmarks.

387
00:19:13,480 --> 00:19:16,860
And so he tends to put together
these kits to try to help people

388
00:19:16,980 --> 00:19:18,580
because it's not that straightforward.

389
00:19:19,020 --> 00:19:22,460
And then I know there's some ongoing
work with the Postgres performance

390
00:19:22,840 --> 00:19:27,400
farm initiative, but it's really
hard, right, to agree upon what

391
00:19:27,400 --> 00:19:31,360
is a useful benchmark in general
to run.

392
00:19:31,720 --> 00:19:35,520
So you can find some regression
or whatever.

393
00:19:35,580 --> 00:19:39,720
And it's like, in order to actually
prove that that's correct

394
00:19:39,960 --> 00:19:43,480
or valid is a lot of work on the
part of everyone else.

395
00:19:43,780 --> 00:19:48,400
So I think in some ways, publishing
benchmarks, you can't just

396
00:19:48,400 --> 00:19:51,820
say I set up this farm and I ran
all these benchmarks and now

397
00:19:51,820 --> 00:19:53,660
I expect people to go look at them.

398
00:19:53,860 --> 00:19:57,620
As a benchmarker, you're kind of
accountable for the analysis

399
00:19:58,420 --> 00:20:01,340
and trying to find out if your
results are valid.

400
00:20:02,320 --> 00:20:03,380
Nikolay: Practically useful.

401
00:20:03,760 --> 00:20:04,740
Practically useful.

402
00:20:04,740 --> 00:20:08,800
I found 1 day, I found, we found,
my team found bottleneck in

403
00:20:08,800 --> 00:20:10,680
WAL sender for logical replication.

404
00:20:10,680 --> 00:20:12,220
It was so easy to reproduce.

405
00:20:12,600 --> 00:20:15,960
You just create some transactions
with delete and roll back them.

406
00:20:16,240 --> 00:20:19,800
And like at some point very quickly
you reach a hundred percent

407
00:20:19,800 --> 00:20:21,300
of CPU for WAL sender.

408
00:20:21,900 --> 00:20:25,440
And it led to bad conclusions that
in production we won't be

409
00:20:25,440 --> 00:20:29,980
able to use logical for many things,
but it turned out that this

410
00:20:30,040 --> 00:20:34,040
kind of workload doesn't happen
in the wild at all, because it's

411
00:20:34,040 --> 00:20:37,000
like kind of very specific kind
of workload.

412
00:20:37,080 --> 00:20:41,920
So I guess this bottleneck maybe
it's not worth fixing right

413
00:20:41,920 --> 00:20:42,540
now, right?

414
00:20:42,540 --> 00:20:45,040
I mean, it's like, so it's interesting.

415
00:20:45,480 --> 00:20:49,240
Melanie: Yeah, you can definitely
report things like that on

416
00:20:49,300 --> 00:20:53,360
the hackers mailing list or on
performance mailing lists and

417
00:20:53,440 --> 00:20:58,100
have a discussion about, you know,
if it's kind of the right

418
00:20:58,100 --> 00:21:01,780
thing to focus on and also, you
know, if it's reproducible.

419
00:21:02,240 --> 00:21:04,860
Nikolay: I did some discussions
with some hackers about this

420
00:21:04,860 --> 00:21:08,040
and didn't meet understanding that
it's worth fixing.

421
00:21:08,040 --> 00:21:11,100
And then I realized in production,
we don't see such bottleneck

422
00:21:11,140 --> 00:21:11,580
at all.

423
00:21:11,580 --> 00:21:16,540
And I just like postpone this research
for future maybe.

424
00:21:16,720 --> 00:21:22,120
What do you think about observability
tools to be used in feature

425
00:21:22,120 --> 00:21:22,980
benchmarks?

426
00:21:23,360 --> 00:21:28,320
What's too like extensions or additional
tools and so on what

427
00:21:28,320 --> 00:21:29,120
to look at?

428
00:21:29,280 --> 00:21:31,500
To avoid observer effect as well,
right?

429
00:21:32,140 --> 00:21:37,080
Melanie: Yeah, I mean, I tend to
not worry too much about the

430
00:21:37,080 --> 00:21:40,580
observer effect, depending on what
tools I'm using, because I,

431
00:21:41,180 --> 00:21:43,980
you have to do some, I mean, as
long as you don't have like log

432
00:21:43,980 --> 00:21:46,960
min duration statement set to
0 or something like that.

433
00:21:46,960 --> 00:21:51,380
But if you're comparing to another
version of Postgres, and you're

434
00:21:51,380 --> 00:21:54,400
using the same observability tools,
it's sort of the price you

435
00:21:54,400 --> 00:21:55,760
have to pay, some overhead.

436
00:21:56,040 --> 00:21:59,240
But every person that does performance
work is different.

437
00:21:59,280 --> 00:22:02,260
When I started, I made the mistake
of trying to like generalize

438
00:22:02,480 --> 00:22:06,740
and make some of my tools useful
to other people and no one wants

439
00:22:06,740 --> 00:22:07,080
that.

440
00:22:07,080 --> 00:22:09,140
So I learned that the hard way.

441
00:22:09,140 --> 00:22:14,980
But personally, I find that it's
very hard to look at aggregated

442
00:22:15,140 --> 00:22:19,220
numbers at the end of a benchmark
run and make sense of it.

443
00:22:19,540 --> 00:22:23,400
And like, that might be what you
present to other people, because

444
00:22:23,400 --> 00:22:26,100
they don't want to look at every
detailed, you know, graph that

445
00:22:26,100 --> 00:22:27,840
you produced or chart that you
produced.

446
00:22:28,180 --> 00:22:32,940
But while I'm doing the work, I
can't do it at all without visual

447
00:22:33,740 --> 00:22:36,260
representations and charts and
things like that.

448
00:22:36,420 --> 00:22:40,920
So I use for that, I mean, for
the actual charting, I use like

449
00:22:40,920 --> 00:22:42,880
Matplotlib and pandas and that
kind of thing.

450
00:22:42,880 --> 00:22:46,560
But I have all sorts of scripts
that do things because I use

451
00:22:46,560 --> 00:22:48,220
a lot of data input sources.

452
00:22:48,340 --> 00:22:51,920
So, I mean, one of the things that
depends on the patch, but I,

453
00:22:52,200 --> 00:22:57,280
so I'll query pg_stat_io, or, you
know, depending on what the

454
00:22:57,280 --> 00:23:02,580
patch is, pg_stat_activity or pg_stat_all_tables,
and then gather

455
00:23:02,580 --> 00:23:05,940
certain information every 2 seconds
or every 1 second.

456
00:23:06,240 --> 00:23:10,120
And then I have scripts that take
this output and then I'm able

457
00:23:10,120 --> 00:23:14,900
to just basically make it CSVs
and then load it into pandas data

458
00:23:14,900 --> 00:23:15,400
frames.

459
00:23:16,020 --> 00:23:19,120
So for example, I've been working
on vacuum recently, and then

460
00:23:19,120 --> 00:23:23,640
one of the things that you consider
when you're doing vacuum performance

461
00:23:23,720 --> 00:23:26,680
work is how often vacuum work is
being done.

462
00:23:26,680 --> 00:23:30,060
So, you might want to look at the
wait events and look at, you

463
00:23:30,060 --> 00:23:35,200
know, how many if autovacuum workers
are waiting on vacuum delay,

464
00:23:35,280 --> 00:23:38,000
wait events a lot and like not
actually doing the vacuuming work

465
00:23:38,000 --> 00:23:41,120
and if you have the right autovacuum settings to actually be

466
00:23:41,120 --> 00:23:44,440
able to observe the thing you're
trying to observe.

467
00:23:45,220 --> 00:23:48,680
So, you know, just gathering that
information and then plotting

468
00:23:48,680 --> 00:23:50,420
it correctly and that kind of thing.

469
00:23:50,900 --> 00:23:54,600
Another thing, because Postgres
uses buffered I/O right now,

470
00:23:55,420 --> 00:24:00,460
I tend to use external tools for
I/O observability, like iostat,

471
00:24:00,520 --> 00:24:04,020
because you can't actually tell
with, you know, reads and writes

472
00:24:04,020 --> 00:24:08,560
in the Postgres statistics, that
could be a read or write to the

473
00:24:08,560 --> 00:24:09,480
kernel buffer cache.

474
00:24:09,480 --> 00:24:11,840
You could be reading from there
and that's obviously going to

475
00:24:11,840 --> 00:24:15,300
be quite different than actually
reads from disk.

476
00:24:15,620 --> 00:24:18,460
So I use iostat depending
on what I'm doing.

477
00:24:18,460 --> 00:24:22,420
And then also there's different
files like, well, I use Linux,

478
00:24:22,420 --> 00:24:26,700
so there's different CPU statistics
or memory usage statistics

479
00:24:26,940 --> 00:24:28,140
that you can get.

480
00:24:28,140 --> 00:24:32,860
And so a lot of what my scripts
do is just query whatever files

481
00:24:32,860 --> 00:24:35,880
somewhere in the file system that
has that information over time

482
00:24:36,140 --> 00:24:39,280
at different intervals so that
I can see, okay, what's the memory

483
00:24:39,280 --> 00:24:44,900
utilization or what kinds of effects
is this patch having on

484
00:24:44,900 --> 00:24:46,080
my resource utilization?

485
00:24:46,920 --> 00:24:48,840
So that's a source that I use.

486
00:24:48,960 --> 00:24:52,600
And then I also use some different
Postgres extensions, again,

487
00:24:52,600 --> 00:24:53,560
depending on the patch.

488
00:24:53,560 --> 00:24:57,540
So pg_buffercache, I look at the...

489
00:24:58,200 --> 00:25:02,660
Again, I just query it at intervals
and then see in terms of

490
00:25:02,660 --> 00:25:05,140
the shared buffers that I have,
how many are pinned.

491
00:25:05,280 --> 00:25:10,580
And it helps me to see how the
patch is affecting the behavior

492
00:25:10,840 --> 00:25:13,700
of shared buffers and utilization
there.

493
00:25:13,940 --> 00:25:17,360
So it's, there's a, Basically all
of the statistics views are

494
00:25:17,360 --> 00:25:18,140
fair game.

495
00:25:18,740 --> 00:25:22,460
All of the, the output from
pgbench itself.

496
00:25:22,540 --> 00:25:23,820
So I parse that.

497
00:25:23,880 --> 00:25:27,600
So there's progress output, which
is like, you know, pgbench

498
00:25:27,600 --> 00:25:29,620
does some averaging and that kind
of thing.

499
00:25:29,620 --> 00:25:33,340
So I also parse the execution reports
that come out, which is

500
00:25:33,340 --> 00:25:35,400
like, they call it a transaction,
very confusing.

501
00:25:35,420 --> 00:25:38,540
It's literally just an execution
of the script by a worker.

502
00:25:39,000 --> 00:25:43,460
So you might get an average TPS,
but depending on, you know,

503
00:25:43,460 --> 00:25:46,360
because of how it's calculated,
that might not actually show

504
00:25:46,360 --> 00:25:52,580
you the P99 latency of an individual
execution of one of the execution

505
00:25:52,580 --> 00:25:52,940
scripts.

506
00:25:52,940 --> 00:25:55,680
So I typically parse all the execution
reports.

507
00:25:55,680 --> 00:25:58,520
There's one line for each execution.

508
00:25:59,100 --> 00:26:03,260
And then do various analysis of
that and plot that.

509
00:26:03,780 --> 00:26:08,940
So that's an input and there's
always new sources and it's funny,

510
00:26:08,940 --> 00:26:12,320
I'll take some analysis challenges
and say, I haven't figured

511
00:26:12,320 --> 00:26:15,900
out like why there's this weird
spike in IO at this one particular

512
00:26:15,980 --> 00:26:16,360
time.

513
00:26:16,360 --> 00:26:19,640
And then he's like, well, here's
this other input source you

514
00:26:19,640 --> 00:26:21,140
could use to investigate this.

515
00:26:21,140 --> 00:26:23,980
And I was like, I have 55 input
sources or what?

516
00:26:23,980 --> 00:26:24,840
How is it possible?

517
00:26:24,840 --> 00:26:26,780
There's something else that I could
look at.

518
00:26:26,780 --> 00:26:29,840
But I guess like everything is
caused by something.

519
00:26:30,360 --> 00:26:35,000
And so it's possible to find out
what it is, I guess, if you

520
00:26:35,000 --> 00:26:35,940
look hard enough.

521
00:26:36,580 --> 00:26:37,080
Nikolay: Right.

522
00:26:37,220 --> 00:26:42,420
Don't you feel like it'll be so
good to have P99, P95 in pgbench?

523
00:26:44,160 --> 00:26:45,400
It would be so good.

524
00:26:46,120 --> 00:26:46,350
It would.

525
00:26:46,350 --> 00:26:47,520
It would work well, right?

526
00:26:48,060 --> 00:26:52,280
I'm also using pgbench all the
time, like 10 seconds, 30 seconds.

527
00:26:52,280 --> 00:26:55,580
It's so great to see how like caches
are filled and so on, like,

528
00:26:55,580 --> 00:26:57,160
but lack of percentiles.

529
00:26:58,700 --> 00:26:58,860
Melanie: Yeah.

530
00:26:58,860 --> 00:26:59,360
Yeah.

531
00:26:59,380 --> 00:27:04,400
So if you do get the execution
reports, you can just like, then

532
00:27:04,960 --> 00:27:07,510
read them in with some Python tooling.

533
00:27:07,510 --> 00:27:11,180
Nikolay: Execution reports, it's
like hyphen R or

534
00:27:11,640 --> 00:27:15,560
Melanie: It's dash dash L and then
you provide a log prefix.

535
00:27:15,720 --> 00:27:16,220
Yeah.

536
00:27:16,380 --> 00:27:20,660
So that gives you the actual time
that each execution of the

537
00:27:20,660 --> 00:27:21,160
script.

538
00:27:21,160 --> 00:27:23,940
Nikolay: So this is so slow.

539
00:27:23,940 --> 00:27:24,440
No.

540
00:27:25,320 --> 00:27:27,940
Melanie: Well, I mean, it's not
free.

541
00:27:28,040 --> 00:27:29,080
Yeah, that's true.

542
00:27:30,060 --> 00:27:33,540
But I think if you wanted the P99
latency, I'm sure there's statistical

543
00:27:33,620 --> 00:27:37,160
methods for getting it without
recording every execution, but

544
00:27:37,160 --> 00:27:39,280
like you would need something like
that.

545
00:27:39,280 --> 00:27:39,440
Nikolay: Yeah.

546
00:27:39,440 --> 00:27:43,260
Do we put it, do you use the actual
file there or like some RAM

547
00:27:43,260 --> 00:27:44,580
disk, or memory or something?

548
00:27:44,580 --> 00:27:48,120
Melanie: I just, I use tmpfs and
then when it's over, I copy

549
00:27:48,120 --> 00:27:49,540
it somewhere to a disk.

550
00:27:49,540 --> 00:27:50,020
Yeah.

551
00:27:50,020 --> 00:27:50,780
Nikolay: Good idea.

552
00:27:51,100 --> 00:27:53,740
I will add it to my tool set.

553
00:27:54,340 --> 00:27:54,840
Melanie: Yeah.

554
00:27:54,860 --> 00:27:57,180
At first, I was pretty apprehensive
about doing it because it

555
00:27:57,180 --> 00:28:02,120
sounded like a lot of work, but
once you actually have the execution,

556
00:28:02,500 --> 00:28:07,000
you literally just like the statistics
modules in Python, you 

557
00:28:07,000 --> 00:28:10,980
can just give it the quantile you 
want and then it just calculates

558
00:28:11,120 --> 00:28:14,740
it's 2 lines of code or 3 or 4
or whatever.

559
00:28:16,800 --> 00:28:18,300
Nikolay: Yeah, and you mentioned
pg_stat_io.

560
00:28:19,820 --> 00:28:23,980
This was great when I first saw
it, like this should have existed many

561
00:28:23,980 --> 00:28:27,880
years already, and we discussed
it with Michael, and we here in 

562
00:28:27,880 --> 00:28:34,280
PostgresFM often discuss
lack of buffers in plans, for

563
00:28:34,280 --> 00:28:35,220
example, right?

564
00:28:35,220 --> 00:28:37,000
In individual query plans.

565
00:28:37,020 --> 00:28:41,320
Buffers is like, we try to advertise 
it's so important to have

566
00:28:41,320 --> 00:28:42,980
buffers in execution plans.

567
00:28:43,080 --> 00:28:45,540
And this is a macro level, like 
high level.

568
00:28:46,120 --> 00:28:47,500
So good to have it, right?

569
00:28:47,500 --> 00:28:50,820
But unfortunately, I cannot say 
I've used it because it's only in

570
00:28:50,820 --> 00:28:51,660
Postgres 16.

571
00:28:51,660 --> 00:28:54,500
We don't have it in most production 
systems yet, still.

572
00:28:55,080 --> 00:29:00,680
But maybe next year, I will start
feeling it in production with 

573
00:29:00,680 --> 00:29:01,440
our customers.

574
00:29:01,880 --> 00:29:03,080
Thank you for doing this.

575
00:29:03,080 --> 00:29:08,200
My question is how you came up
with this idea originally?

576
00:29:08,940 --> 00:29:10,680
Melanie: Well, I can't take credit
for it.

577
00:29:10,680 --> 00:29:13,520
So Andres has done a lot of performance
work for years and years.

578
00:29:13,520 --> 00:29:16,820
So when I joined Microsoft, one of
the first projects he suggested

579
00:29:16,960 --> 00:29:24,180
was to add some view that would
improve observability of I/O.

580
00:29:24,520 --> 00:29:28,400
And at that time, we didn't have
the shared memory stats system.

581
00:29:28,780 --> 00:29:32,580
So, the patch was much bigger at
that point because of the way 

582
00:29:32,580 --> 00:29:33,980
the stats collector worked.

583
00:29:34,700 --> 00:29:38,040
So I did an early version of it,
and then I used it in my own 

584
00:29:38,040 --> 00:29:41,840
development, and I ended up working
on some of the AIO work and

585
00:29:41,840 --> 00:29:44,780
found it really useful for that.

586
00:29:45,060 --> 00:29:49,640
And then I ended up doing some
review of the shared memory stats 

587
00:29:49,640 --> 00:29:50,140
patch.

588
00:29:51,280 --> 00:29:55,760
And after that, it made the pg_stat_io
patch much smaller and

589
00:29:55,760 --> 00:29:56,260
simpler.

590
00:29:56,880 --> 00:29:58,360
So then I went back to it.

591
00:29:58,360 --> 00:30:02,440
But I honestly, like For me, I
just looked at it as this helps 

592
00:30:02,440 --> 00:30:05,240
me as a developer doing benchmarking.

593
00:30:06,240 --> 00:30:10,820
So it's cool that users like it
because I don't have that perspective

594
00:30:10,900 --> 00:30:11,400
really.

595
00:30:11,880 --> 00:30:15,520
Michael: Obviously it's useful
for benchmarking itself, but I'm 

596
00:30:15,520 --> 00:30:18,580
wondering about the other way around,
like, did you have to look 

597
00:30:18,580 --> 00:30:21,420
at its overhead and how did you 
go about that?

598
00:30:21,820 --> 00:30:22,200
Melanie: Yeah.

599
00:30:22,200 --> 00:30:25,380
So it was much easier, like I said,
once we had shared memory

600
00:30:25,380 --> 00:30:29,880
stats, because the way that it 
didn't need any extra infrastructure

601
00:30:30,660 --> 00:30:34,640
is basically just like another
data structure, couple other data

602
00:30:34,640 --> 00:30:35,140
structures.

603
00:30:35,740 --> 00:30:38,560
And then everything works the way
that it does now.

604
00:30:38,560 --> 00:30:41,880
We already had some counting for
different kinds of reads and

605
00:30:41,880 --> 00:30:45,900
writes, because that's how for
explain, analyze and that kind

606
00:30:45,900 --> 00:30:47,520
of thing, we had those counters.

607
00:30:47,540 --> 00:30:53,700
And then there was some reads and
writes there in pg_stat_io,

608
00:30:53,960 --> 00:30:57,880
all tables at 1 of the views that
has some io things.

609
00:30:58,100 --> 00:31:00,840
And then there was some for pg_stat_statements.

610
00:31:00,860 --> 00:31:03,940
So there was some places where
we already had some IO counting.

611
00:31:04,060 --> 00:31:08,300
And then the other places where
we didn't, there wasn't too much

612
00:31:08,300 --> 00:31:13,780
risk because like if you're adding
some timing for, you know,

613
00:31:13,780 --> 00:31:18,080
whatever, background writer or
checkpointer, there are processes

614
00:31:18,260 --> 00:31:21,140
that it's okay to add a tiny bit
of overhead.

615
00:31:21,180 --> 00:31:23,100
So you didn't have to think too
much about it.

616
00:31:23,100 --> 00:31:27,180
I think the hard thing was actually
finding all of the places

617
00:31:27,240 --> 00:31:32,240
that we could possibly do IO for
the types of IO that we were

618
00:31:32,240 --> 00:31:36,000
representing and then thinking
about how to categorize it.

619
00:31:36,000 --> 00:31:39,360
And especially for the different
buffer access strategies.

620
00:31:39,740 --> 00:31:44,860
So like that's one of the things
that was completely not surfaced

621
00:31:44,860 --> 00:31:49,800
to users was how buffer access
strategies work and how they reuse

622
00:31:49,820 --> 00:31:50,320
buffers.

623
00:31:50,380 --> 00:31:54,520
And so for copy and vacuum and
things like that, literally you

624
00:31:54,520 --> 00:31:56,280
have no information about that.

625
00:31:56,280 --> 00:31:59,700
So I think just figuring out how
to represent all of that, that

626
00:31:59,700 --> 00:32:03,380
was probably like the least performance
work that patch did

627
00:32:03,380 --> 00:32:06,460
in terms of the impact, performance
impact of it, just because there

628
00:32:06,460 --> 00:32:11,460
was a lot of instrumentation already,
just that was not complete,

629
00:32:11,720 --> 00:32:12,820
if that makes sense.

630
00:32:13,440 --> 00:32:15,860
Michael: So you're saying like
we're already paying a lot of

631
00:32:15,860 --> 00:32:19,480
the overhead, like we're running
Postgres 15, for example, we're

632
00:32:19,480 --> 00:32:21,340
doing a lot of that counting anyway.

633
00:32:21,500 --> 00:32:25,520
So storing it, but like also displaying
it in an extra view,

634
00:32:25,520 --> 00:32:28,340
there's very little added in actual
instrumentation.

635
00:32:28,520 --> 00:32:29,280
Okay, cool.

636
00:32:29,340 --> 00:32:32,720
Melanie: Yeah, I mean, like we
had for backends that were doing

637
00:32:32,980 --> 00:32:38,800
queries, like counting of reads
and writes and hits in most places.

638
00:32:38,800 --> 00:32:44,380
So the places that it was sort
of net new were mostly for types

639
00:32:44,380 --> 00:32:48,760
of processes that we don't mind
a little bit of overhead if there

640
00:32:48,760 --> 00:32:49,040
was.

641
00:32:49,040 --> 00:32:50,440
And it would be so small.

642
00:32:50,440 --> 00:32:52,260
I mean, it's measuring IO, right?

643
00:32:52,420 --> 00:32:56,100
So like the, if you're doing IO,
the overhead of that will cover

644
00:32:56,100 --> 00:33:00,040
any sort of overhead of instrumentation
for the most part.

645
00:33:01,300 --> 00:33:04,100
Nikolay: Unless it's hits, actually,
it also counts hits.

646
00:33:04,120 --> 00:33:08,400
And I found in some cases, for
example, monitoring systems try

647
00:33:08,400 --> 00:33:12,240
to avoid storing these metrics
at all, for example, from pg_stat_statements.

648
00:33:12,240 --> 00:33:12,740
statements.

649
00:33:13,140 --> 00:33:17,880
But I had some incidents where
we lacked hits numbers.

650
00:33:18,540 --> 00:33:20,220
But pg_stat_io has hits, right?

651
00:33:20,220 --> 00:33:22,400
I'm looking at the documentation
just to confirm.

652
00:33:22,960 --> 00:33:23,933
Melanie: Yeah, it does have hits.

653
00:33:23,933 --> 00:33:24,120
And this

654
00:33:24,120 --> 00:33:27,040
Nikolay: might be kind of overhead
of counting.

655
00:33:27,040 --> 00:33:31,320
This might be interesting because
it doesn't have real IO, I

656
00:33:31,320 --> 00:33:33,060
mean, in terms of disk.

657
00:33:33,860 --> 00:33:37,320
Melanie: Yeah, it actually was
already being counted.

658
00:33:37,340 --> 00:33:41,080
We had hits somewhere else, one of
the other views, or maybe it

659
00:33:41,080 --> 00:33:43,220
was for pg_stat_statements, I'm
not sure.

660
00:33:43,280 --> 00:33:45,920
So that was already kind of being
counted.

661
00:33:45,920 --> 00:33:50,080
And when you go to get something
from shared buffers, there's

662
00:33:50,080 --> 00:33:51,900
locking and all sorts

663
00:33:51,900 --> 00:33:52,400
Nikolay: of

664
00:33:52,540 --> 00:33:56,240
Melanie: other things that happen
that are expensive enough that

665
00:33:56,240 --> 00:33:57,800
it was pretty negligible.

666
00:33:59,200 --> 00:33:59,700
Nice.

667
00:34:00,060 --> 00:34:00,820
Nikolay: Yeah, okay.

668
00:34:00,920 --> 00:34:04,900
But there are some benchmarks proving
that Timescale didn't add.

669
00:34:05,500 --> 00:34:07,040
Or like just...

670
00:34:09,060 --> 00:34:10,780
Melanie: Wait, you want to see
my receipts?

671
00:34:11,780 --> 00:34:12,480
Nikolay: Yeah, yeah.

672
00:34:12,640 --> 00:34:13,520
I'm just curious.

673
00:34:13,520 --> 00:34:17,160
It's just a general feeling that
it doesn't bring overhead or

674
00:34:17,160 --> 00:34:19,900
there's a solid proof of it in
terms of benchmarks.

675
00:34:19,900 --> 00:34:20,860
Melanie: That's a good point.

676
00:34:20,860 --> 00:34:23,660
It's mostly based on a feeling.

677
00:34:23,940 --> 00:34:25,580
So, like...

678
00:34:28,260 --> 00:34:31,020
Nikolay: I'm curious, is it worth
checking this, for example?

679
00:34:31,580 --> 00:34:36,560
Because maybe we can strip it out
somehow or compare 15 versus

680
00:34:36,560 --> 00:34:40,220
16 in various cases and see if...

681
00:34:40,460 --> 00:34:43,420
Melanie: I would look at those
benchmarks, but I guess my gut

682
00:34:43,420 --> 00:34:48,280
feeling would be that it, because
we already were doing, when

683
00:34:48,280 --> 00:34:52,940
track_io_timing is on, we already
were doing a lot of the measurements

684
00:34:53,440 --> 00:34:56,100
that it wouldn't be much different.

685
00:34:56,120 --> 00:34:58,220
Nikolay: But actually, I can answer
myself.

686
00:34:58,220 --> 00:35:00,020
We have benchmarks with insane
number.

687
00:35:00,020 --> 00:35:00,200
Oh,

688
00:35:00,200 --> 00:35:00,920
Melanie: you do?

689
00:35:01,020 --> 00:35:01,420
Okay.

690
00:35:01,420 --> 00:35:04,760
Nikolay: No, no, I mean, 15, we
compare all versions and we try

691
00:35:04,760 --> 00:35:07,400
to reach as many TPS as possible.

692
00:35:07,420 --> 00:35:09,360
Achieved how many, Michael?

693
00:35:09,360 --> 00:35:10,340
4 million?

694
00:35:10,360 --> 00:35:10,760
Michael: Nearly.

695
00:35:10,760 --> 00:35:11,260
3.75,

696
00:35:11,760 --> 00:35:12,040
Nikolay: I think.

697
00:35:12,040 --> 00:35:14,060
Yeah, just select only pgbench,
right?

698
00:35:14,060 --> 00:35:18,120
And we didn't see any degradation
for version 16 at all.

699
00:35:18,120 --> 00:35:21,400
It's already some kind of proof
that there is no overhead here.

700
00:35:21,400 --> 00:35:25,060
And we have wait events and flame
graphs and so on.

701
00:35:25,120 --> 00:35:26,380
We would notice it.

702
00:35:26,380 --> 00:35:29,160
Michael: But Nikolay, you wouldn't
have had TrackIO timing on.

703
00:35:29,540 --> 00:35:30,780
Nikolay: I want to check it.

704
00:35:30,900 --> 00:35:31,880
And maybe repeat it.

705
00:35:31,880 --> 00:35:32,540
No way.

706
00:35:33,080 --> 00:35:34,120
Yeah, I will check.

707
00:35:34,120 --> 00:35:35,580
Michael: If you did, you should
repeat again.

708
00:35:35,580 --> 00:35:36,920
Maybe you'll get 4 million.

709
00:35:36,980 --> 00:35:38,580
Melanie: Okay, I'm checking right
now.

710
00:35:38,860 --> 00:35:39,280
Nikolay: Good point.

711
00:35:39,280 --> 00:35:40,520
Michael: But yeah, this is great.

712
00:35:40,520 --> 00:35:43,380
Like, it's super interesting what,
like how do you choose which

713
00:35:43,380 --> 00:35:46,080
things to benchmark and which things
not to then?

714
00:35:46,080 --> 00:35:46,580
Melanie: Yeah.

715
00:35:47,100 --> 00:35:51,720
So I think developing intuition
around that is, so I actually

716
00:35:51,960 --> 00:35:55,580
became a software engineer, like
my, the first thing that I worked

717
00:35:55,580 --> 00:35:58,640
on professionally was Greenplum,
which is a fork of Postgres.

718
00:35:58,700 --> 00:36:01,680
So almost all of my software engineering
experience comes from

719
00:36:01,680 --> 00:36:03,060
being a Postgres engineer.

720
00:36:03,060 --> 00:36:05,940
I was pretty new to software engineering,
right?

721
00:36:05,940 --> 00:36:09,060
So I think a lot of software engineers
develop intuition around,

722
00:36:09,060 --> 00:36:14,940
like, you know, if you take a lock
every millisecond versus every

723
00:36:14,940 --> 00:36:18,400
30 seconds, like, which 1 of those
is okay, or whatever, right?

724
00:36:18,400 --> 00:36:20,440
Like, it obviously depends on the
context.

725
00:36:20,500 --> 00:36:25,020
But I think there's a lot of sort
of performance intuition that

726
00:36:25,020 --> 00:36:27,480
people develop just from being
software engineers.

727
00:36:27,840 --> 00:36:31,560
And that's sort of like the first
step of, you know, thinking

728
00:36:31,560 --> 00:36:36,100
about, okay, well, is it okay for
me to take an exclusive lock

729
00:36:36,100 --> 00:36:36,340
here?

730
00:36:36,340 --> 00:36:39,900
Do I need to find a lock-free design
for this or whatever it

731
00:36:39,900 --> 00:36:40,400
is?

732
00:36:40,440 --> 00:36:43,940
And that's when you have something
where you had a performance

733
00:36:43,940 --> 00:36:48,320
question and you said, is it okay
that I do X, then you have

734
00:36:48,320 --> 00:36:51,280
to think about, okay, how would
I benchmark this?

735
00:36:51,280 --> 00:36:54,120
And in a lot of cases, maybe not
benchmark, how would I profile

736
00:36:54,160 --> 00:36:54,480
this?

737
00:36:54,480 --> 00:36:58,620
How would I evaluate or microbenchmark
this or something like

738
00:36:58,620 --> 00:36:59,120
that?

739
00:36:59,280 --> 00:37:05,700
So, For example, in 17, I worked
on some refactoring of heap pruning,

740
00:37:06,540 --> 00:37:10,520
which is kind of scary for different
reasons because of like

741
00:37:10,520 --> 00:37:11,760
data corruption stuff.

742
00:37:11,760 --> 00:37:16,360
But from the perspective of performance,
You also, because we

743
00:37:16,360 --> 00:37:20,520
do on access heap pruning when you're
doing a select query or

744
00:37:20,900 --> 00:37:24,580
reading in the buffer to do an
update, changing that code, adding

745
00:37:24,580 --> 00:37:26,100
any sort of.

746
00:37:26,400 --> 00:37:30,140
Additional instructions, you know,
potentially has overhead.

747
00:37:30,480 --> 00:37:33,740
So of course, like if you're doing
actual heap pruning, then you

748
00:37:33,740 --> 00:37:37,860
have IO from the WAL and you have
IO from writing out the dirty

749
00:37:38,320 --> 00:37:40,020
buffer and that kind of thing.

750
00:37:40,080 --> 00:37:43,840
And so you have to think about
is the thing that I'm changing

751
00:37:44,120 --> 00:37:47,820
Actually going to matter in the
context that it's in?

752
00:37:48,340 --> 00:37:52,540
And then I think evaluate from
there.

753
00:37:52,540 --> 00:37:56,680
So and also how important would
it be if there was a difference,

754
00:37:56,680 --> 00:37:57,040
right?

755
00:37:57,040 --> 00:38:01,160
So like, on every SELECT query,
if it's a little bit more expensive,

756
00:38:01,160 --> 00:38:04,340
that's obviously bad, even if it's
only a tiny bit more expensive.

757
00:38:05,020 --> 00:38:09,720
So I, I and Heikki did a lot of
microbenchmarking around those

758
00:38:09,720 --> 00:38:12,180
changes to make sure that they
seemed right.

759
00:38:12,180 --> 00:38:15,040
And then you have to design the
benchmark so that it's actually

760
00:38:15,060 --> 00:38:19,200
exercising what you think might
be the bottleneck, which is its

761
00:38:19,200 --> 00:38:20,040
own challenge.

762
00:38:20,220 --> 00:38:24,200
And then like, right now I'm working
on something where it has

763
00:38:24,200 --> 00:38:28,660
to do with improving eager freezing
so that you freeze more data,

764
00:38:28,660 --> 00:38:32,540
VACUUM freezes more data sooner,
And there's a bunch of different

765
00:38:32,540 --> 00:38:33,180
pieces to it.

766
00:38:33,180 --> 00:38:37,740
But 1 of them is a new sort of
responsibility that Background

767
00:38:37,740 --> 00:38:41,720
Writer would have for maintaining
a new data structure that contains

768
00:38:41,720 --> 00:38:42,620
some new statistics.

769
00:38:43,180 --> 00:38:46,940
And 1 of the questions I got in
review was like, have you profiled

770
00:38:47,040 --> 00:38:50,140
this to make sure that it doesn't
have an overhead?

771
00:38:50,740 --> 00:38:54,340
And my first reaction was like,
but it's Background Writer.

772
00:38:54,340 --> 00:38:57,080
No 1 cares about Background Writer's
performance, you know, like

773
00:38:57,080 --> 00:39:00,860
kind of like, it's different than
on every SELECT query, you're

774
00:39:00,860 --> 00:39:02,780
going to have some overhead.

775
00:39:03,000 --> 00:39:09,140
But again, it was doing something
that was like more, I'd say,

776
00:39:09,140 --> 00:39:09,640
heavyweight.

777
00:39:10,260 --> 00:39:15,060
Like it's doing some calculations
that could take time and doing

778
00:39:15,060 --> 00:39:16,640
them while holding an exclusive
lock.

779
00:39:16,640 --> 00:39:20,640
And so even though it's Background
Writer, like, it's worth proving

780
00:39:20,860 --> 00:39:25,220
that, you know, the number of extra
instructions, for example,

781
00:39:25,520 --> 00:39:26,500
is tolerable.

782
00:39:27,180 --> 00:39:28,580
And that's a judgment call.

783
00:39:28,580 --> 00:39:32,140
But what you want is a paper trail,
too, that you, like, did

784
00:39:32,140 --> 00:39:32,920
your due diligence.

785
00:39:32,920 --> 00:39:35,880
Which is why it was funny when
you were asking me about pg_stat_io's

786
00:39:36,300 --> 00:39:39,780
performance impact, because in
that case, I kind of thought,

787
00:39:40,320 --> 00:39:44,920
it's negligible compared to the
fact that you're doing IO first

788
00:39:44,920 --> 00:39:47,380
of all, and second of all, we had
a lot of that instrumentation

789
00:39:47,600 --> 00:39:48,100
there.

790
00:39:48,380 --> 00:39:53,500
But for most other things, like,
I haven't had a patch other

791
00:39:53,500 --> 00:39:59,840
than that probably in the last
couple years where there wasn't

792
00:40:00,240 --> 00:40:03,860
a discussion of what the performance
impact could be and then

793
00:40:03,860 --> 00:40:05,720
benchmarking or profiling.

794
00:40:06,660 --> 00:40:07,160
Nikolay: Nice.

795
00:40:07,660 --> 00:40:09,260
Yeah, Michael, it was on.

796
00:40:10,240 --> 00:40:12,140
Tracking IO timing was on.

797
00:40:13,140 --> 00:40:14,760
Michael: Well, that's good news
for Melanie.

798
00:40:15,600 --> 00:40:17,300
There's potential for 4 million.

799
00:40:17,460 --> 00:40:20,700
Nikolay: Yeah, and we didn't see
a difference between 15 and

800
00:40:20,820 --> 00:40:22,200
16, so I mean...

801
00:40:22,540 --> 00:40:23,540
Wow, wow, wow.

802
00:40:23,680 --> 00:40:26,580
Yeah, I already asked our robot
to double-check with tracking

803
00:40:26,580 --> 00:40:29,480
our timing off, so we'll see this
as well.

804
00:40:29,480 --> 00:40:30,240
Yeah, good.

805
00:40:30,240 --> 00:40:30,740
Interesting.

806
00:40:30,860 --> 00:40:32,860
I hope we'll have more TPS.

807
00:40:33,820 --> 00:40:39,440
I’m curious what you are working
on right now and direction of

808
00:40:39,440 --> 00:40:42,140
future work, ideas, and so on.

809
00:40:42,840 --> 00:40:46,640
Melanie: Yeah, so I mean, I started
a project last release that

810
00:40:46,640 --> 00:40:51,600
had to do with reducing the amount
of WAL emitted by vacuum.

811
00:40:52,100 --> 00:40:56,680
And that was like kind of a refactoring,
but it spun out a lot

812
00:40:56,680 --> 00:40:57,900
of other projects.

813
00:40:58,260 --> 00:41:02,440
And I think I sort of wanted to
take a break from vacuum this

814
00:41:02,440 --> 00:41:04,680
release, but it didn't work out
that way.

815
00:41:04,680 --> 00:41:09,060
So I’m working on a couple of different
things around vacuum.

816
00:41:09,060 --> 00:41:13,420
And some of them are some leftover
pieces around combining WAL

817
00:41:13,420 --> 00:41:18,480
records that vacuum used to emit,
like up to 6 WAL records per

818
00:41:18,480 --> 00:41:18,980
block.

819
00:41:19,280 --> 00:41:23,220
So we've combined a few of them
and there's more to do.

820
00:41:23,220 --> 00:41:26,700
So that's 1 of the things, but
sort of the more probably exciting

821
00:41:27,380 --> 00:41:29,880
work is around the freezing work.

822
00:41:30,040 --> 00:41:34,760
So that basically was born out
of Peter Geoghegan a few years ago,

823
00:41:34,760 --> 00:41:39,780
I think it was '16, did some work
to have us do more eager freezing.

824
00:41:39,940 --> 00:41:43,780
So one of the things we hear a lot
from users is they have, say,

825
00:41:43,780 --> 00:41:46,700
insert-only tables and none of
the data gets frozen.

826
00:41:47,080 --> 00:41:50,680
And then all of a sudden they have
anti-wraparound vacuums and

827
00:41:50,680 --> 00:41:53,180
then eventually their system becomes
read-only.

828
00:41:53,320 --> 00:41:57,160
But even before that anti-wraparound
vacuums, if you don't have

829
00:41:57,160 --> 00:42:00,860
any actual like pruning or vacuuming
to do, it can be expensive

830
00:42:00,920 --> 00:42:02,060
from an IO perspective.

831
00:42:02,720 --> 00:42:05,140
And so it can slow your whole system
down.

832
00:42:05,140 --> 00:42:10,360
And so being more aggressive about
freezing data sooner was something

833
00:42:10,520 --> 00:42:12,580
that people had been asking about.

834
00:42:13,080 --> 00:42:16,540
And so Peter worked on that and
it's really difficult to find

835
00:42:16,620 --> 00:42:21,640
a heuristic that works for determining
whether or not to freeze

836
00:42:21,660 --> 00:42:26,020
something because if you have a
table where you're doing a lot

837
00:42:26,020 --> 00:42:30,660
of updates, like a standard pgbench 
built-in workload, then

838
00:42:30,740 --> 00:42:35,020
it's a waste to freeze things because
freezing emits WAL and

839
00:42:35,020 --> 00:42:37,100
it dirties the page and that kind
of thing.

840
00:42:37,260 --> 00:42:40,660
So finding a heuristic that freezes
the stuff you want to freeze

841
00:42:40,660 --> 00:42:43,940
and doesn't freeze the stuff you
don't want to freeze is then

842
00:42:43,940 --> 00:42:44,940
a journey.

843
00:42:46,020 --> 00:42:50,580
And one of the parts that's really
hard is that you have to try

844
00:42:50,580 --> 00:42:54,720
to come up with all of the workloads
that are important, the

845
00:42:54,720 --> 00:42:57,840
best case and worst case, especially
the worst case workloads

846
00:42:58,420 --> 00:43:02,280
that users might have and think
about how it would perform.

847
00:43:03,040 --> 00:43:06,100
And that's so hard.

848
00:43:06,740 --> 00:43:08,080
So that took a lot of time.

849
00:43:08,080 --> 00:43:12,340
And I spent a lot of time improving
my different benchmarking

850
00:43:12,600 --> 00:43:15,560
setups and coming up with these
different workloads and evaluating

851
00:43:15,660 --> 00:43:17,000
them, looking at them.

852
00:43:17,440 --> 00:43:19,540
So that's one thing I've been working
on.

853
00:43:19,540 --> 00:43:24,960
And then last release, I also,
so there's the new read stream

854
00:43:24,960 --> 00:43:30,780
API that does vector I/O, does
larger reads that Thomas Munro

855
00:43:31,340 --> 00:43:32,320
did last release.

856
00:43:32,320 --> 00:43:34,740
And I worked on the different users.

857
00:43:34,920 --> 00:43:39,040
So for sequential scans and this,
I had some additional users

858
00:43:39,040 --> 00:43:43,000
that didn't go into 17 because
their performance effects that

859
00:43:43,000 --> 00:43:47,020
you have to sort of analyze really
closely and they weren't ready

860
00:43:47,320 --> 00:43:47,780
And one of

861
00:43:47,780 --> 00:43:47,880
Nikolay: them was...

862
00:43:47,880 --> 00:43:47,980
Did you

863
00:43:47,980 --> 00:43:49,960
Michael: use the word analyze deliberately
there?

864
00:43:49,960 --> 00:43:50,880
Was it analyze?

865
00:43:51,160 --> 00:43:51,660
Melanie: Yeah.

866
00:43:51,900 --> 00:43:53,320
No, no, analyze went in actually.

867
00:43:53,320 --> 00:43:53,940
Michael: Oh, cool.

868
00:43:53,940 --> 00:43:54,140
Nice.

869
00:43:54,140 --> 00:43:55,040
Melanie: That was the laws work.

870
00:43:55,040 --> 00:43:55,900
Yeah.

871
00:43:56,000 --> 00:43:59,940
So analyze went in, but a bitmap
heap scan and vacuum didn't.

872
00:44:00,100 --> 00:44:00,800
Michael: Got it.

873
00:44:00,960 --> 00:44:05,140
Melanie: So vacuum is an interesting
one because streaming vacuum

874
00:44:05,140 --> 00:44:12,280
sounds great, but vacuum uses a
ring buffer to keep it from having

875
00:44:12,280 --> 00:44:14,720
too much of a negative effect on
shared buffers.

876
00:44:15,060 --> 00:44:18,040
So you don't wash out all of the
things that are resident and

877
00:44:18,040 --> 00:44:19,500
shared buffers from your workload.

878
00:44:19,840 --> 00:44:25,940
It reuses buffers, which means
that it has to write WAL, typically,

879
00:44:26,120 --> 00:44:32,720
and there's a lot of calculation
around how big the ring is such

880
00:44:32,720 --> 00:44:36,820
that WAL writer can help you or
help vacuum to write out the

881
00:44:36,820 --> 00:44:37,060
WALs.

882
00:44:37,060 --> 00:44:38,920
So vacuum's not doing all of it.

883
00:44:39,320 --> 00:44:44,360
And if you do I/O and if you do
reads in a different size, that

884
00:44:44,360 --> 00:44:48,260
means that you're dirtying and
needing to clean buffers at a

885
00:44:48,260 --> 00:44:49,040
different rate.

886
00:44:49,040 --> 00:44:54,220
And that actually was affecting
how many individual WAL writes

887
00:44:54,220 --> 00:44:55,760
and syncs that we would do.

888
00:44:55,760 --> 00:45:01,120
So like, in a given Fsync, you
can have different amounts of

889
00:45:01,120 --> 00:45:04,460
data, but each Fsync's a system
call and has overhead, right?

890
00:45:04,640 --> 00:45:10,380
So you want to amortize that cost
over a large enough amount

891
00:45:10,380 --> 00:45:15,560
of data and also have WAL writer
help to do some of the work.

892
00:45:16,060 --> 00:45:21,080
So by doing bigger I/Os, we are
actually making the performance

893
00:45:21,340 --> 00:45:25,460
worse in some cases because of
these interactions with additional

894
00:45:25,900 --> 00:45:26,760
WAL syncs.

895
00:45:26,920 --> 00:45:31,280
So Thomas Munro has been working
on a, I think he's calling

896
00:45:31,280 --> 00:45:32,860
it streaming write-behind.

897
00:45:33,420 --> 00:45:38,060
It's kind of a way of thinking
about, for the purposes eventually

898
00:45:38,100 --> 00:45:44,600
of AIO, but of how backends and
maintenance processes and writes

899
00:45:44,600 --> 00:45:50,240
in general in Postgres can sort
of make sure that they are cleaning

900
00:45:50,500 --> 00:45:54,520
up and doing writes in large enough,
if they're doing larger

901
00:45:54,520 --> 00:45:58,260
writes, right, that they're actually
being conscious of when

902
00:45:58,260 --> 00:46:01,580
they should do those writes based
on the WAL that it's required

903
00:46:01,640 --> 00:46:02,500
that you emit.

904
00:46:02,600 --> 00:46:05,020
So like if you're doing reads,
you don't have to really think

905
00:46:05,020 --> 00:46:05,780
about that.

906
00:46:05,860 --> 00:46:08,860
But if you're writing out data,
the WAL associated with those

907
00:46:08,860 --> 00:46:10,920
dirty buffers has to be flushed
first.

908
00:46:10,920 --> 00:46:13,880
So you have to think about when
do I wanna do writes?

909
00:46:14,240 --> 00:46:16,280
Because you need to think about
when do you actually want to

910
00:46:16,280 --> 00:46:17,220
do the WAL writes.

911
00:46:17,220 --> 00:46:19,900
So he's been working on that and
it's a tough problem.

912
00:46:20,080 --> 00:46:23,480
A lot of benchmarking, a lot of
thinking about it and buffer

913
00:46:23,480 --> 00:46:25,880
access strategies really complicate
it.

914
00:46:26,140 --> 00:46:28,400
And those are used for large selects.

915
00:46:28,480 --> 00:46:31,780
Those are used for, you know, copy,
vacuum.

916
00:46:31,920 --> 00:46:36,940
So he's doing some work around
that and I'll probably sort of

917
00:46:36,940 --> 00:46:41,180
jump in after some of this vacuum
stuff is done and try to work

918
00:46:41,180 --> 00:46:44,440
on some of the AIO work that's
coming up.

919
00:46:44,540 --> 00:46:48,160
And I think that ultimately, like
just between those things,

920
00:46:48,160 --> 00:46:53,240
that'll probably be most of what
I end up doing on my own.

921
00:46:53,240 --> 00:46:57,340
But I would like to see there's
some for pg_stat_io, there's been

922
00:46:57,340 --> 00:47:02,020
a patch to add WAL-related IO
statistics to it that's been around

923
00:47:02,020 --> 00:47:03,180
for about a release.

924
00:47:03,340 --> 00:47:08,040
And we haven't basically what we
have, we haven't come to an

925
00:47:08,040 --> 00:47:13,720
agreement on what the right thing
to do is because you can technically,

926
00:47:14,020 --> 00:47:19,120
like our thought was that we would
have the block size for, like,

927
00:47:19,660 --> 00:47:22,100
you can configure, you know, your
block size for Postgres.

928
00:47:22,100 --> 00:47:23,640
You can configure it for WAL.

929
00:47:23,760 --> 00:47:27,520
And that they would be in units
of block size, WAL block size.

930
00:47:27,520 --> 00:47:32,600
But that actually isn't always
the unit that we do reads and writes

931
00:47:32,600 --> 00:47:34,400
in exactly, like it usually is.

932
00:47:34,400 --> 00:47:39,100
But so now we're actually talking
about rejiggering pg_stat_io,

933
00:47:39,960 --> 00:47:44,940
especially in light of vector.io
to change it so that it's not

934
00:47:44,940 --> 00:47:47,640
looking at, you know, if you have
a thousand writes and then

935
00:47:47,640 --> 00:47:51,180
you have the block size and then
you multiply them to get the

936
00:47:51,180 --> 00:47:54,360
number of bytes, do we change it
to represent it differently

937
00:47:54,400 --> 00:47:56,460
and put it just in the number of
bytes?

938
00:47:56,840 --> 00:47:58,520
And how do you represent that?

939
00:47:58,520 --> 00:48:00,860
If you're doing it, do you want
the right number of writes to

940
00:48:00,860 --> 00:48:02,540
be the number of system calls.

941
00:48:02,620 --> 00:48:05,700
So we were just thinking about
how to actually represent it,

942
00:48:05,700 --> 00:48:11,680
but I would love to see WAL stats
go into pg_stat_io in 18 also.

943
00:48:12,100 --> 00:48:13,040
So we'll see.

944
00:48:13,460 --> 00:48:16,640
And there's also a bunch of other
exciting things going on, like

945
00:48:16,640 --> 00:48:21,020
Masahiko Sawada is working on parallelizing
the first and third

946
00:48:21,020 --> 00:48:22,160
phases of vacuum.

947
00:48:22,580 --> 00:48:31,160
So he did that really exciting
work in 17 on TID store and making

948
00:48:31,160 --> 00:48:34,820
it so that I think that's my favorite
feature from 17.

949
00:48:35,380 --> 00:48:36,860
Nikolay: Can you explain it to?

950
00:48:36,900 --> 00:48:38,500
Michael: Yeah, I don't know that.

951
00:48:38,680 --> 00:48:41,900
Melanie: Yeah, so you know, like
one of people's biggest complaints

952
00:48:41,920 --> 00:48:45,780
about vacuum is when they have
to do multiple rounds of index

953
00:48:45,780 --> 00:48:49,400
vacuuming because the maintenance
work mem, even if you set it

954
00:48:49,400 --> 00:48:53,180
to a high value, you might end
up filling it up with dead tids

955
00:48:53,600 --> 00:48:57,180
and then you have to do a round
of index vacuuming.

956
00:48:57,180 --> 00:49:01,320
And if your indexes are very big,
you can imagine that that ends

957
00:49:01,320 --> 00:49:04,080
up really affecting your vacuum
performance.

958
00:49:04,740 --> 00:49:09,640
So what he did was introduce a
new data structure that organized

959
00:49:09,840 --> 00:49:13,400
the dead tids in a way that was
much, much, much more efficient.

960
00:49:14,440 --> 00:49:21,560
And that made it so that you're
using way less memory for the

961
00:49:21,560 --> 00:49:26,280
actual dead tid storage, but then
it also, you can end up having,

962
00:49:26,280 --> 00:49:28,260
you can end up sort of changing
it on the fly.

963
00:49:28,260 --> 00:49:31,740
You could, I mean, basically you're
not tied to the same restrictions

964
00:49:31,880 --> 00:49:36,140
that we had around the size of
maintenance work mem as before.

965
00:49:36,600 --> 00:49:42,880
So most people are going to not
need ever to do multiple passes

966
00:49:42,940 --> 00:49:44,640
of index vacuuming.

967
00:49:44,760 --> 00:49:48,960
So I think that people with very
large indexes and large tables

968
00:49:48,960 --> 00:49:51,820
are going to see their vacuum performance
be a lot better.

969
00:49:51,820 --> 00:49:56,180
Nikolay: I have big customers with
big partitioned tables.

970
00:49:56,380 --> 00:50:00,360
We talk about partitioning for
years and it's hard and they will

971
00:50:00,360 --> 00:50:00,760
benefit.

972
00:50:00,760 --> 00:50:03,020
So vacuuming will be faster, basically,
right?

973
00:50:03,160 --> 00:50:03,520
Melanie: Yeah.

974
00:50:03,520 --> 00:50:06,740
I mean, if you don't have this
problem with the multiple index

975
00:50:06,740 --> 00:50:09,160
vacuuming passes, then maybe not.

976
00:50:09,160 --> 00:50:14,440
But it's something we hear a lot
from sophisticated customers.

977
00:50:14,440 --> 00:50:18,300
Nikolay: And also, I remember in
PostgreSQL 17, before that,

978
00:50:18,820 --> 00:50:22,180
it was only up to 1 gigabyte could
be used for.

979
00:50:22,180 --> 00:50:24,440
Melanie: Yeah, that restriction
is lifted now.

980
00:50:24,620 --> 00:50:26,580
Nikolay: Is it related to this
work or?

981
00:50:26,580 --> 00:50:27,800
Melanie: Yes, it is related,

982
00:50:27,800 --> 00:50:27,980
Nikolay: yeah.

983
00:50:27,980 --> 00:50:29,460
I suspected so, good.

984
00:50:29,460 --> 00:50:29,960
Melanie: Yeah.

985
00:50:30,320 --> 00:50:31,320
So that's really cool.

986
00:50:31,320 --> 00:50:36,040
And I think it's something that's
hard to explain unless you've

987
00:50:36,040 --> 00:50:36,880
had the problem.

988
00:50:37,540 --> 00:50:40,400
I think if you haven't had this
problem, you're not really thinking

989
00:50:40,400 --> 00:50:41,120
about it.

990
00:50:41,120 --> 00:50:43,420
But for people that do have this
problem, I think it's going

991
00:50:43,420 --> 00:50:44,920
to make a big difference.

992
00:50:46,940 --> 00:50:50,380
He's building on that work and
there's a lot of things that we

993
00:50:50,380 --> 00:50:55,080
can do with vacuum because of having,
so this read stream API

994
00:50:55,580 --> 00:51:00,040
will make it somewhat easier to
do parallelization of the first

995
00:51:00,060 --> 00:51:01,720
and third phases of vacuuming.

996
00:51:01,780 --> 00:51:04,160
So he's working on that as well.

997
00:51:04,160 --> 00:51:08,100
And then I think, you know, there's
the, the dreams of having

998
00:51:08,160 --> 00:51:13,220
global indexes involved a step
where you persisted the dead TIDs

999
00:51:13,860 --> 00:51:17,240
to disk, because otherwise you
wouldn't be able to.

1000
00:51:18,280 --> 00:51:21,480
Basically, there's, there's discussion
of being able to split

1001
00:51:21,480 --> 00:51:26,260
up the phases of vacuuming and
be able to vacuum indexes at some

1002
00:51:26,260 --> 00:51:28,580
point and come back to it and do
it later.

1003
00:51:28,580 --> 00:51:30,800
Like vacuum, just do the first
stage of vacuuming.

1004
00:51:30,880 --> 00:51:34,920
And if you want to do the phases
of vacuum separately, you have

1005
00:51:34,920 --> 00:51:39,640
to have the dead tuples that you
need, basically what index

1006
00:51:39,640 --> 00:51:42,660
entries you need to delete, you
need to save that somewhere that's

1007
00:51:42,660 --> 00:51:43,540
not in memory.

1008
00:51:44,160 --> 00:51:49,120
And so there's some modifications
to the TID store, but they

1009
00:51:49,120 --> 00:51:52,500
can make and probably make it easy
to persist.

1010
00:51:52,580 --> 00:51:55,380
And I don't know that he's planning
on working on that now, but

1011
00:51:55,380 --> 00:51:58,920
Dilip had done some work on dead
TID storage.

1012
00:51:58,980 --> 00:52:02,240
And so I think there's like a lot
of exciting things around vacuum

1013
00:52:02,240 --> 00:52:03,220
that'll be happening.

1014
00:52:03,740 --> 00:52:06,760
And there's also been discussion,
which I know has happened in

1015
00:52:06,760 --> 00:52:09,580
the past, but new discussion about
auto vacuum scheduling.

1016
00:52:10,520 --> 00:52:13,940
And not just auto vacuum scheduling,
but cost-based delay and

1017
00:52:13,940 --> 00:52:18,540
how to change that to be more adaptive
and to use statistics

1018
00:52:18,920 --> 00:52:24,620
basically that are collected while
vacuuming to decrease the

1019
00:52:24,620 --> 00:52:29,340
delay adaptively while vacuuming
if you're not able to finish

1020
00:52:29,340 --> 00:52:31,960
the vacuum appropriately and things
like that.

1021
00:52:31,960 --> 00:52:32,860
So we'll see.

1022
00:52:32,900 --> 00:52:36,800
Nikolay: So we talked to Peter
Geoghegan in the past as well.

1023
00:52:37,060 --> 00:52:38,640
And I remember the duplication.

1024
00:52:39,060 --> 00:52:44,520
So you both sound not like people
dreamed to get rid of vacuum

1025
00:52:45,040 --> 00:52:48,660
completely, but you sound like
there is a big potential for improvements

1026
00:52:48,760 --> 00:52:53,740
and it's happening and so it's
going to stay, but with improvements.

1027
00:52:54,060 --> 00:52:56,540
This is like I'm trying to simplify
for our audience.

1028
00:52:56,840 --> 00:52:58,060
Melanie: The undo people?

1029
00:52:58,520 --> 00:52:59,340
I mean, they still

1030
00:52:59,340 --> 00:53:00,680
Nikolay: would have needed vacuum.

1031
00:53:02,320 --> 00:53:03,700
The undo people, right?

1032
00:53:04,200 --> 00:53:08,440
Melanie: I think that there has
been a lot more work on vacuum

1033
00:53:08,480 --> 00:53:11,120
this release than there has been

1034
00:53:11,120 --> 00:53:12,720
Nikolay: 17 or 18

1035
00:53:12,720 --> 00:53:13,780
Melanie: 17 and 18

1036
00:53:13,920 --> 00:53:14,780
Nikolay: 17 18.

1037
00:53:15,420 --> 00:53:17,160
Yeah, yeah, that's great.

1038
00:53:17,220 --> 00:53:18,000
That's great.

1039
00:53:18,540 --> 00:53:20,810
So it's, yeah, many people need
it.

1040
00:53:20,810 --> 00:53:22,780
I mean, Postgres instances.

1041
00:53:23,800 --> 00:53:24,520
So yeah.

1042
00:53:25,240 --> 00:53:26,420
Michael: I like this kind of work.

1043
00:53:26,420 --> 00:53:29,280
I think Peter Geoghegan's done a lot
of work in the last few releases,

1044
00:53:29,280 --> 00:53:34,100
and you and others that focus on
things that will help people.

1045
00:53:34,200 --> 00:53:37,400
Well, almost everybody will benefit
without having to change

1046
00:53:37,400 --> 00:53:37,900
anything.

1047
00:53:38,500 --> 00:53:40,920
And that's, those are so powerful.

1048
00:53:40,920 --> 00:53:45,360
They're so easily forgotten or
easily ignored, but everybody

1049
00:53:45,360 --> 00:53:50,320
benefits, it gets better for almost
everybody without A, noticing

1050
00:53:50,460 --> 00:53:52,160
and B, having to do anything.

1051
00:53:52,820 --> 00:53:54,960
Just so powerful, those kind of
changes.

1052
00:53:54,960 --> 00:53:58,700
So thank you for working on those
and encouraging others to as

1053
00:53:58,700 --> 00:53:59,140
well.

1054
00:53:59,140 --> 00:54:00,260
Appreciate it a lot.

1055
00:54:00,300 --> 00:54:03,120
Melanie: Yeah, the dream is that
we get rid of all of those auto

1056
00:54:03,120 --> 00:54:06,100
vacuum gucks because that's terrible.

1057
00:54:06,580 --> 00:54:11,360
Like, there's, I don't know, the
fact that there are so many

1058
00:54:11,360 --> 00:54:12,240
blog posts.

1059
00:54:12,440 --> 00:54:15,420
I mean, yeah, like they should
all be gone.

1060
00:54:16,260 --> 00:54:20,580
Your auto vacuum configuration
should be basically nothing.

1061
00:54:20,580 --> 00:54:22,700
I mean, the system should figure
out what to do.

1062
00:54:22,700 --> 00:54:27,220
Nikolay: We often discuss outdated
defaults, but this is a radical

1063
00:54:28,080 --> 00:54:28,580
position.

1064
00:54:29,380 --> 00:54:29,860
Melanie: Right.

1065
00:54:29,860 --> 00:54:30,060
Yeah.

1066
00:54:30,060 --> 00:54:34,780
I mean, it's not going to happen
overnight, but I think my vision,

1067
00:54:35,320 --> 00:54:37,200
a lot of people I think want this.

1068
00:54:37,260 --> 00:54:41,180
I think the cost-based delay system
is really hard to understand

1069
00:54:41,280 --> 00:54:42,760
because it's not intuitive.

1070
00:54:43,380 --> 00:54:46,880
The relationship, like scale factor
and all of that, It's just

1071
00:54:46,880 --> 00:54:47,900
like, what is that?

1072
00:54:47,900 --> 00:54:51,140
I mean, I understand why it happened
from a developer perspective,

1073
00:54:51,260 --> 00:54:56,000
but I think that getting vacuum
to do the right thing is like

1074
00:54:56,000 --> 00:55:00,040
our job as engineers and shouldn't
be users' jobs, basically.

1075
00:55:00,620 --> 00:55:01,900
So we'll see.

1076
00:55:03,140 --> 00:55:03,480
Michael: Wow.

1077
00:55:03,480 --> 00:55:06,500
Well, we don't do clips on this
show, but if we did, I think

1078
00:55:06,500 --> 00:55:07,560
that would be one.

1079
00:55:09,060 --> 00:55:11,140
Nikolay: Also number of workers,
3 workers.

1080
00:55:11,280 --> 00:55:15,200
If you have like almost 200 cores
and you have only 3 workers

1081
00:55:15,200 --> 00:55:17,140
by default, this is insane, right?

1082
00:55:17,140 --> 00:55:17,520
Melanie: Yeah.

1083
00:55:17,520 --> 00:55:19,400
Well, actually, so I will take
it back.

1084
00:55:19,400 --> 00:55:22,720
I would say that the one configuration
that I think users should

1085
00:55:22,720 --> 00:55:26,140
be able to provide is the max number
of workers because like

1086
00:55:27,100 --> 00:55:29,240
that's your system preference,
right?

1087
00:55:29,240 --> 00:55:32,220
Like, I mean, maybe not the number
of workers, the minimum number,

1088
00:55:32,220 --> 00:55:33,240
but the maximum number.

1089
00:55:33,240 --> 00:55:37,720
You should be able to keep the
system from having 50 processes

1090
00:55:37,800 --> 00:55:40,120
just doing vacuuming work if you
want.

1091
00:55:40,120 --> 00:55:40,460
Right?

1092
00:55:40,460 --> 00:55:41,760
I think that they should be.

1093
00:55:41,760 --> 00:55:45,320
Nikolay: Because this is second
phase of rights.

1094
00:55:45,620 --> 00:55:46,280
We need it.

1095
00:55:46,280 --> 00:55:46,680
Right?

1096
00:55:46,680 --> 00:55:49,840
Because if you just deleted something
or updated something, work

1097
00:55:49,840 --> 00:55:53,540
is not done until it's vacuumed
and you need more workers anyway.

1098
00:55:54,140 --> 00:55:57,040
Michael: So- Also, Postgres doesn't
know how many cores, right?

1099
00:55:57,380 --> 00:55:58,220
Nikolay: Doesn't know.

1100
00:55:58,320 --> 00:56:00,180
Melanie: It doesn't take that into
account.

1101
00:56:00,200 --> 00:56:00,700
Yeah.

1102
00:56:00,780 --> 00:56:03,940
I mean, but like people are allowed
to use their servers for

1103
00:56:03,940 --> 00:56:07,400
other things at the same time. I
maybe you don't, but like we sort

1104
00:56:07,400 --> 00:56:11,180
of we don't assume that we have
access to everything, I guess.

1105
00:56:11,520 --> 00:56:12,740
Michael: Yeah, makes sense.

1106
00:56:13,380 --> 00:56:14,340
Nikolay: Thank you for coming.

1107
00:56:14,340 --> 00:56:18,620
It was very interesting. Thank you
for this work again.

1108
00:56:18,620 --> 00:56:21,600
Thank you for pg_stat_io
work and so on, and benchmarks.

1109
00:56:21,780 --> 00:56:25,780
I like, we will keep an eye on
it and good luck.

1110
00:56:25,960 --> 00:56:26,620
Good luck.

1111
00:56:26,650 --> 00:56:27,500
Melanie: Thank you.

1112
00:56:27,500 --> 00:56:28,720
Michael: Yeah, thanks so much.

1113
00:56:28,940 --> 00:56:29,440
Melanie: Thanks.

1114
00:56:29,820 --> 00:56:30,060
Michael: Bye.