1
00:00:00,060 --> 00:00:02,560
Michael: Hello and welcome to Postgres
FM, a weekly show about

2
00:00:02,560 --> 00:00:03,460
all things PostgreSQL.

3
00:00:03,560 --> 00:00:06,420
I am Michael, founder of pgMustard,
and this is my co-host Nikolay,

4
00:00:06,420 --> 00:00:07,760
founder of Postgres.AI.

5
00:00:07,840 --> 00:00:09,860
Hey Nikolay, what are we talking
about today?

6
00:00:10,320 --> 00:00:14,780
Nikolay: Hi Michael, let's talk
about a lot of TPS we can get

7
00:00:14,780 --> 00:00:16,580
from a single Postgres node.

8
00:00:16,840 --> 00:00:18,320
I recently achieved something.

9
00:00:18,320 --> 00:00:21,880
I'm curious, is it some interesting
achievement or like no?

10
00:00:22,360 --> 00:00:26,100
I'm sure it's possible to get more
on bigger machines.

11
00:00:26,820 --> 00:00:29,100
Because my machines were big enough,
right?

12
00:00:29,100 --> 00:00:35,820
It's like just 360 vCPUs and just
2 terabytes of RAM or something,

13
00:00:36,580 --> 00:00:39,400
maybe 1.5 terabytes of RAM.

14
00:00:39,860 --> 00:00:40,360
Yeah.

15
00:00:40,440 --> 00:00:42,740
Fourth generation AMD Epic.

16
00:00:42,740 --> 00:00:47,010
It's, there is already 5th generation
AMD Epic and there are Intel,

17
00:00:47,020 --> 00:00:51,440
Intel's also, Xeon Scalable, bigger
instances and so on, it's

18
00:00:51,440 --> 00:00:51,940
possible.

19
00:00:52,440 --> 00:00:58,120
But I've almost achieved 4 million
transactions per second, full-fledged

20
00:00:58,220 --> 00:01:02,420
transactions, not just like what
other NoSQL databases have,

21
00:01:02,420 --> 00:01:03,200
it's transactions.

22
00:01:04,120 --> 00:01:09,360
Read-only, we should admit, read-only,
and meaningful transactions.

23
00:01:10,180 --> 00:01:14,580
It was a table with, I think, 10
million rows.

24
00:01:14,580 --> 00:01:16,900
It's a tiny table for this machine.

25
00:01:17,100 --> 00:01:18,660
Everything is cached, of course.

26
00:01:19,280 --> 00:01:22,580
And it was a primary key lookup
and PgBench.

27
00:01:23,560 --> 00:01:26,640
Regular PgBench accounts table,
you know it, right?

28
00:01:27,700 --> 00:01:28,200
So...

29
00:01:30,260 --> 00:01:31,780
Michael: Is it single read per
transaction?

30
00:01:31,780 --> 00:01:33,080
Single query per transaction?

31
00:01:33,080 --> 00:01:33,540
Is that right?

32
00:01:33,540 --> 00:01:36,380
Nikolay: That's a good question
actually, let's double check,

33
00:01:36,380 --> 00:01:39,280
I will double check it right now,
but I think yes, it's just

34
00:01:39,280 --> 00:01:44,540
select from PgBench, random
number, random ID, and that's

35
00:01:44,540 --> 00:01:45,880
it, should be, should be.

36
00:01:45,880 --> 00:01:49,040
Michael: I was just hoping you'd
get to multiply your 4 million

37
00:01:49,040 --> 00:01:49,960
by 2 or something.

38
00:01:49,960 --> 00:01:52,460
Well, well, I get queries per second.

39
00:01:52,540 --> 00:01:54,560
Nikolay: Let me tell you what you
don't know.

40
00:01:54,800 --> 00:01:59,640
Yesterday I was driving to LA and
I used this like I think

41
00:01:59,640 --> 00:02:02,900
I should check my eye and
get even more TPS.

42
00:02:03,680 --> 00:02:09,340
And thankfully, I had AI with autopilot
in the car, right?

43
00:02:09,340 --> 00:02:15,260
So I could, maybe it's still not
a good idea to use phone, But

44
00:02:15,260 --> 00:02:16,860
it's just a quick question, right?

45
00:02:17,120 --> 00:02:20,600
So I asked my bot to repeat this
TPS.

46
00:02:20,600 --> 00:02:23,420
Oh, I forgot to mention, all these
experiments are done with

47
00:02:23,420 --> 00:02:29,040
our Postgres.AI bot, which simplifies
experimentation a lot.

48
00:02:29,200 --> 00:02:33,020
So If you need to provision a huge
machine, run some PgBench

49
00:02:33,160 --> 00:02:36,300
during a couple of hours according
to some script, you just ask

50
00:02:36,300 --> 00:02:39,880
for it, then you think, oh, what
if we change something else?

51
00:02:40,080 --> 00:02:42,160
It's just a few words.

52
00:02:42,180 --> 00:02:45,820
You just ask, and then it reports
back in a couple of hours.

53
00:02:46,400 --> 00:02:52,200
So I asked, can you please check
not regular select only for

54
00:02:52,200 --> 00:02:57,160
PgBench, which is very standard
Postgres for benchmarks, quite

55
00:02:57,160 --> 00:03:03,360
simple but widely used, but rather
let's just use SELECT;

56
00:03:05,020 --> 00:03:06,020
Not meaningful, right?

57
00:03:06,020 --> 00:03:09,400
Not dealing with data at all, but
still it needs to be parsed,

58
00:03:09,400 --> 00:03:09,900
right?

59
00:03:09,960 --> 00:03:10,740
And so on.

60
00:03:10,840 --> 00:03:13,320
Michael: I had it on my list of
questions for you to ask what

61
00:03:13,320 --> 00:03:14,440
would happen if we got...

62
00:03:14,440 --> 00:03:17,160
I was going to suggest SELECT 1,
which I guess is the beginner's

63
00:03:17,160 --> 00:03:17,360
version.

64
00:03:17,360 --> 00:03:22,060
Nikolay: I can tell you we exceeded
12000000,

65
00:03:22,060 --> 00:03:22,540
Michael: wow nice.

66
00:03:22,540 --> 00:03:26,280
Nikolay: Yeah but it's I don't
consider it fair because it's

67
00:03:26,280 --> 00:03:27,540
just heartbeat.

68
00:03:27,560 --> 00:03:28,700
Well it

69
00:03:28,700 --> 00:03:31,300
Michael: depends what you're measuring
right like If you're trying

70
00:03:31,300 --> 00:03:35,340
to measure what's the most we can
achieve, then sure, it's meaningful.

71
00:03:35,740 --> 00:03:39,500
But if you're actually trying to
do something meaningful, read-only

72
00:03:39,520 --> 00:03:41,140
is probably not realistic.

73
00:03:41,980 --> 00:03:46,320
There's so many things you could
say, and I like the idea of

74
00:03:46,320 --> 00:03:50,640
what's the maximum we could get
if we don't worry about constraints,

75
00:03:50,740 --> 00:03:52,400
about how realistic it is.

76
00:03:52,800 --> 00:03:53,300
Nikolay: Right.

77
00:03:53,400 --> 00:03:57,380
So yeah, in this case, we only
checked the parser and general

78
00:03:57,720 --> 00:04:01,360
transaction, read-only transaction,
not full-fledged transaction,

79
00:04:02,620 --> 00:04:03,500
all these pieces.

80
00:04:03,500 --> 00:04:07,960
But we didn't check how Postgres
works with, for example, buffer

81
00:04:07,960 --> 00:04:10,420
pool with reading data and so on.

82
00:04:11,000 --> 00:04:14,720
Because I know there was some contention
in the past, even for

83
00:04:14,720 --> 00:04:19,340
read-only queries, which was improved,
I think, in 9.6 on the

84
00:04:19,340 --> 00:04:21,300
road to achieve 1 million TPS.

85
00:04:21,300 --> 00:04:23,860
We will probably discuss this slightly
later.

86
00:04:24,280 --> 00:04:29,680
In general, what's interesting,
I needed to explain, I hate SELECT

87
00:04:29,680 --> 00:04:30,560
1 statements.

88
00:04:31,160 --> 00:04:32,380
You know why, right?

89
00:04:32,980 --> 00:04:33,480
No.

90
00:04:34,080 --> 00:04:35,860
Because 1 is not needed.

91
00:04:36,160 --> 00:04:36,600
Michael: Fair enough.

92
00:04:36,600 --> 00:04:38,580
Nikolay: You can say SELECT and
pause this.

93
00:04:38,800 --> 00:04:43,760
It will return you a result set
consisting of 0 rows and 0 columns.

94
00:04:44,200 --> 00:04:47,080
Michael: Can you just run semicolon
on its own or something?

95
00:04:47,080 --> 00:04:48,060
Nikolay: Yes, you can.

96
00:04:48,740 --> 00:04:50,460
I think we also discussed it.

97
00:04:50,460 --> 00:04:51,420
Michael: Must have done.

98
00:04:51,620 --> 00:04:55,200
Nikolay: Yeah, it has issues, negative
side effects, because,

99
00:04:55,200 --> 00:05:00,040
for example, you won't see it in
logs, if somehow it's slow.

100
00:05:00,320 --> 00:05:01,620
It's like empty query.

101
00:05:01,860 --> 00:05:05,440
You won't see it in pg_stat_statements,
for example, unlike regular

102
00:05:05,440 --> 00:05:06,400
SELECT;

103
00:05:07,200 --> 00:05:09,980
And I failed to convince bot to
run SELECT;

104
00:05:10,120 --> 00:05:11,320
Somehow it didn't work.

105
00:05:11,320 --> 00:05:15,020
And I also had difficulties convincing
bot to select.

106
00:05:15,020 --> 00:05:16,500
It started telling me...

107
00:05:16,560 --> 00:05:17,700
I think it was Gemini.

108
00:05:17,700 --> 00:05:20,700
It was this passive-aggressive
manner, it's Gemini for sure.

109
00:05:20,740 --> 00:05:25,440
So it started telling me, you know,
like it's incorrect, it's

110
00:05:25,440 --> 00:05:27,040
invalid SQL statement.

111
00:05:27,040 --> 00:05:33,140
So let's like, it offered me, let's
put 1 at least there, or

112
00:05:33,180 --> 00:05:34,340
where is the table name?

113
00:05:34,340 --> 00:05:38,460
Like, okay, let's, I said, no tables,
no tables, it was just

114
00:05:38,460 --> 00:05:42,180
like idle queries, like no real
action.

115
00:05:42,380 --> 00:05:45,400
And then said, like, let's put
at least 1 because otherwise it's

116
00:05:45,400 --> 00:05:46,340
incorrect statement.

117
00:05:46,420 --> 00:05:48,460
I forced, like, are you sure?

118
00:05:48,520 --> 00:05:49,460
You can check.

119
00:05:50,220 --> 00:05:51,720
So we had some battle.

120
00:05:53,200 --> 00:05:58,420
But in general, yeah, it worked
and we achieved 6 million TPS,

121
00:05:58,420 --> 00:05:59,340
which was good.

122
00:06:00,300 --> 00:06:02,840
But SELECT; didn't work for our
bot.

123
00:06:02,840 --> 00:06:04,140
I think it's possible.

124
00:06:04,340 --> 00:06:07,340
I don't expect a huge win compared
to SELECT.

125
00:06:08,000 --> 00:06:09,440
Maybe it will be parser.

126
00:06:09,680 --> 00:06:10,620
I don't know.

127
00:06:11,040 --> 00:06:11,420
You can

128
00:06:11,420 --> 00:06:13,260
Michael: run the experiment manually,
right?

129
00:06:13,260 --> 00:06:15,980
Could you just submit the JSON
yourself?

130
00:06:16,340 --> 00:06:17,840
Nikolay: It's better to convince
the bot.

131
00:06:17,840 --> 00:06:20,460
Michael: You prefer to fight the
LLM, fair enough.

132
00:06:20,460 --> 00:06:24,600
Nikolay: A little bit, yeah, because
every time we do it, I find

133
00:06:24,600 --> 00:06:26,140
opportunities to improve, of course.

134
00:06:27,040 --> 00:06:30,560
But back to meaningful results,
which is interesting.

135
00:06:31,300 --> 00:06:36,500
We almost achieved 4 million, but
originally it was just 1 million.

136
00:06:37,120 --> 00:06:41,180
And when I saw just 1 million on
modern Postgres, we checked

137
00:06:41,180 --> 00:06:42,940
16 and 17 beta 1.

138
00:06:42,940 --> 00:06:44,940
By the way, beta 2 is already out.

139
00:06:45,040 --> 00:06:47,700
I think we should update and switch
to it in our tests.

140
00:06:49,700 --> 00:06:54,400
So I expected much more because
I remember the post from Alexander

141
00:06:54,400 --> 00:07:00,640
Korotkov from 2016, when I remember
he worked with Percona people,

142
00:07:00,900 --> 00:07:02,440
Sveta Smirnova particularly.

143
00:07:03,780 --> 00:07:09,660
They had like a friendly battle,
let's see what's better, Postgres

144
00:07:09,660 --> 00:07:13,820
or MySQL, in terms of how many
TPS we can get from a single node.

145
00:07:14,640 --> 00:07:18,940
And I think both Postgres and MySQL
learned something from it.

146
00:07:19,280 --> 00:07:25,380
And Alexander created a few patches
because some issues were

147
00:07:25,380 --> 00:07:27,380
found, some contention was found.

148
00:07:27,920 --> 00:07:28,940
And it was improved.

149
00:07:29,680 --> 00:07:34,860
In Postgres 9.5 it was not so good
and in 9.6, it was improved just

150
00:07:34,860 --> 00:07:36,000
based on this work.

151
00:07:36,040 --> 00:07:37,760
And here achieved 1 million TPS.

152
00:07:38,360 --> 00:07:40,240
MySQL also achieved 1 million TPS.

153
00:07:40,240 --> 00:07:40,740
Good.

154
00:07:41,200 --> 00:07:45,880
But I was surprised like so many
years, like 8 years, right?

155
00:07:45,880 --> 00:07:49,020
We have so many Postgres versions,
basically 8 Postgres versions

156
00:07:49,020 --> 00:07:51,980
already since then, and maybe 7.

157
00:07:53,000 --> 00:07:54,260
And much better hardware.

158
00:07:54,960 --> 00:07:58,740
Not much, but significantly better
hardware.

159
00:07:58,740 --> 00:08:01,860
So I expected to get more from
this.

160
00:08:02,480 --> 00:08:04,740
Michael: Well, we looked into the
machines, didn't we?

161
00:08:06,820 --> 00:08:10,240
And even back then, they had managed
to get their hands on 144

162
00:08:10,680 --> 00:08:11,920
VCPU machine.

163
00:08:12,800 --> 00:08:16,320
And I think you mentioned yours
was a bit more than double that,

164
00:08:16,320 --> 00:08:16,980
was it?

165
00:08:17,040 --> 00:08:21,220
Nikolay: 360, and I think they
had the Intel's, much older Intel's,

166
00:08:21,220 --> 00:08:26,520
and I had AMD, which is very easily
available on GCP, which I

167
00:08:26,520 --> 00:08:29,280
use because I have credits there,
right?

168
00:08:30,320 --> 00:08:35,180
So yeah, before we started this
call, I cannot miss this.

169
00:08:35,320 --> 00:08:37,320
You showed me what AWS has.

170
00:08:37,820 --> 00:08:38,640
Super impressive.

171
00:08:38,740 --> 00:08:42,840
How many, almost 668, almost 800.

172
00:08:44,000 --> 00:08:45,040
No, more, more.

173
00:08:45,300 --> 00:08:46,220
Almost 900.

174
00:08:47,500 --> 00:08:49,940
Michael: 896 vcpu a month.

175
00:08:50,340 --> 00:08:52,460
Nikolay: And it's Intel scalable
fourth generation.

176
00:08:53,360 --> 00:08:54,900
Michael: Yeah, it'll cost you.

177
00:08:55,200 --> 00:08:58,220
Nikolay: But also 32 terabytes
of RAM maximum.

178
00:08:58,440 --> 00:08:58,940
Yeah.

179
00:09:00,900 --> 00:09:02,960
2224x large.

180
00:09:03,560 --> 00:09:04,060
Wow.

181
00:09:04,360 --> 00:09:06,140
And they don't have spots on it.

182
00:09:06,180 --> 00:09:08,140
Of course, this is something new.

183
00:09:08,300 --> 00:09:10,540
Yeah, and it's fourth generation,
it's scalable.

184
00:09:10,760 --> 00:09:14,400
What I found interesting, like
during last year and this year,

185
00:09:14,440 --> 00:09:18,380
various benchmarks, not this, like,
toy benchmark, it's a side

186
00:09:18,380 --> 00:09:22,000
project for me, but working with
customers, various cases.

187
00:09:22,220 --> 00:09:26,620
What I found, what my impression
right now from Intel's AMD,

188
00:09:26,980 --> 00:09:31,900
like recent versions of them available
in cloud, is that AMD,

189
00:09:33,540 --> 00:09:38,560
like for example, on GCP, you can
get more vCPUs and reach more

190
00:09:38,560 --> 00:09:40,940
TPS on very simple workloads.

191
00:09:42,260 --> 00:09:45,040
And it will be cheaper in general
in terms of how many, like

192
00:09:45,040 --> 00:09:48,880
if you divide TPS per dollar, find
TPS per dollar, it will be

193
00:09:48,880 --> 00:09:54,780
cheaper probably, and you can achieve
higher throughput for simple

194
00:09:54,780 --> 00:09:55,280
workloads.

195
00:09:55,400 --> 00:09:59,560
But when it comes to complex workloads
and various problems like

196
00:10:00,840 --> 00:10:03,900
lightweight lock manager contention
we discussed a few times.

197
00:10:04,740 --> 00:10:08,720
And in general, complex workloads
and behavior of auto vacuum

198
00:10:08,720 --> 00:10:14,160
and so on, AMD looks less preferable
and Intel's are winning.

199
00:10:14,600 --> 00:10:20,360
So that's why I'm additionally
impressed that AWS offers Intel

200
00:10:20,420 --> 00:10:25,460
scalable Xeons, 4th generation,
almost 900 CPUs.

201
00:10:25,460 --> 00:10:26,820
It's insane for me.

202
00:10:27,360 --> 00:10:28,340
32 terabytes.

203
00:10:28,500 --> 00:10:29,000
Wow.

204
00:10:30,060 --> 00:10:34,600
And you think I can exceed 4 million
TPS, right?

205
00:10:34,840 --> 00:10:36,000
It's a joke, I guess.

206
00:10:36,420 --> 00:10:37,120
Well, it'd be interesting

207
00:10:37,120 --> 00:10:41,620
Michael: to hear more about what
you think the current bottlenecks

208
00:10:41,920 --> 00:10:43,480
are, like where you think.

209
00:10:43,500 --> 00:10:49,280
But I guess, like my stupid brain
is thinking, like, this is

210
00:10:49,280 --> 00:10:53,380
probably CPU constrained at this
point.

211
00:10:53,720 --> 00:10:57,080
Throwing more at it makes sense
that it could...

212
00:10:57,340 --> 00:11:00,140
Like, throw double at it, maybe
you can get close to doubling

213
00:11:00,140 --> 00:11:00,640
it.

214
00:11:01,080 --> 00:11:04,400
Nikolay: Well, we collect a lot
of information.

215
00:11:04,540 --> 00:11:09,860
We have a wait event analysis,
we have flame graphs collected

216
00:11:09,860 --> 00:11:11,340
automatically for each step.

217
00:11:12,040 --> 00:11:15,140
And it still needs to be analyzed
deeper.

218
00:11:15,140 --> 00:11:19,400
But what I see right now, the bottlenecks
are mostly communication.

219
00:11:20,080 --> 00:11:24,480
Let's actually maybe take a step
or a couple of steps back and

220
00:11:24,480 --> 00:11:28,220
discuss what issues I had originally.

221
00:11:28,260 --> 00:11:29,480
Why only 1000000?

222
00:11:30,120 --> 00:11:32,320
Michael: Can we take a step back
further than that?

223
00:11:32,320 --> 00:11:33,840
Why were you doing this?

224
00:11:34,400 --> 00:11:34,900
Yeah,

225
00:11:36,060 --> 00:11:37,160
Nikolay: it's a good question.

226
00:11:39,720 --> 00:11:42,280
I was asking myself why do we do
this.

227
00:11:42,280 --> 00:11:44,940
Well, for fun, first of all, actually.

228
00:11:45,060 --> 00:11:46,420
I was very curious.

229
00:11:47,120 --> 00:11:48,500
I remember that work.

230
00:11:49,300 --> 00:11:55,840
I think for Alexander and Sveta
it was also for fun 8 years ago.

231
00:11:56,160 --> 00:12:00,200
But it led to some interesting
findings and eventually to optimizations.

232
00:12:01,360 --> 00:12:05,400
So this fun was converted to something
useful.

233
00:12:05,740 --> 00:12:09,780
And we started doing this for fun
because we have big machines,

234
00:12:09,780 --> 00:12:12,400
we have very good automation, we
have newer Postgres.

235
00:12:12,400 --> 00:12:16,240
I was just curious how easy it
is for us to achieve 1000000 and

236
00:12:16,240 --> 00:12:17,220
maybe go further.

237
00:12:17,800 --> 00:12:22,120
And first thing we saw, since we
have a complex system, a lot

238
00:12:22,120 --> 00:12:27,100
of things, the observability tools,
first thing we saw is that

239
00:12:27,100 --> 00:12:29,560
1 of extensions is a huge bottleneck.

240
00:12:29,640 --> 00:12:30,520
It was pg_stat_kcache.

241
00:12:32,320 --> 00:12:38,860
And it even didn't let us achieve
1 million TPS, quickly showing

242
00:12:38,940 --> 00:12:43,840
huge observer effect, meaning that
just measuring pg_stat_kcache

243
00:12:44,720 --> 00:12:50,040
extends pg_stat_statements to see
metrics related to physical metrics

244
00:12:50,040 --> 00:12:56,780
like CPU, real CPU, user time,
system time, then IOD, real disk

245
00:12:56,780 --> 00:13:02,580
IOD, not just talking to the page
cache, and context switches,

246
00:13:03,940 --> 00:13:08,700
and very good detailed physical
query analysis.

247
00:13:10,160 --> 00:13:14,880
And we like to have it, but if
you have it until the latest versions,

248
00:13:16,300 --> 00:13:20,800
it quickly became a huge overhead
when you have a lot of queries

249
00:13:20,800 --> 00:13:22,620
per second for a specific query.

250
00:13:22,860 --> 00:13:25,380
Michael: I remember we discussed
this back in, I just looked

251
00:13:25,380 --> 00:13:28,000
up the episode back in February,
about overhead.

252
00:13:29,480 --> 00:13:31,560
I'll link that episode up as well.

253
00:13:31,740 --> 00:13:35,420
Nikolay: Right, And yeah, just
to recap, we found that it shows

254
00:13:35,420 --> 00:13:39,340
up in the wait event analysis using
pg_wait_sampling.

255
00:13:39,340 --> 00:13:39,840
Great.

256
00:13:40,420 --> 00:13:44,600
And we just quickly excluded it
and also reported to the maintainers

257
00:13:44,760 --> 00:13:47,320
and They fixed it in 4 hours.

258
00:13:47,320 --> 00:13:48,220
It was amazing.

259
00:13:48,760 --> 00:13:52,740
And we tested it with Bot, confirmed
that now it's improved.

260
00:13:52,960 --> 00:13:55,220
So this is how we achieved 1000000.

261
00:13:55,520 --> 00:13:57,040
But why only 1000000?

262
00:13:57,040 --> 00:13:59,880
We have newer Postgres and better
hardware.

263
00:14:00,720 --> 00:14:02,940
Did I answer the question why,
by the way?

264
00:14:03,500 --> 00:14:04,620
Michael: You said for fun.

265
00:14:05,020 --> 00:14:05,860
Nikolay: For fun, right.

266
00:14:05,860 --> 00:14:07,860
Yeah, this is the key, actually,
here.

267
00:14:07,860 --> 00:14:08,800
I enjoy it.

268
00:14:08,800 --> 00:14:12,980
You know, bot, let's just run pgBench
that way, or this way.

269
00:14:12,980 --> 00:14:15,340
Let's collect more data points.

270
00:14:15,540 --> 00:14:16,860
Let's visualize it.

271
00:14:17,000 --> 00:14:19,640
Let's repeat this, but with an
adjusted version.

272
00:14:19,640 --> 00:14:21,720
Let's compare various versions
for example.

273
00:14:22,360 --> 00:14:25,740
Michael: So it sounds like for
fun but also you've got credits

274
00:14:26,140 --> 00:14:27,620
and also dogfooding, right?

275
00:14:27,620 --> 00:14:31,340
Like trying out products, trying
it out, here's an idea.

276
00:14:32,500 --> 00:14:36,380
Nikolay: Yeah, we try to collect
successful experiments, and

277
00:14:36,380 --> 00:14:39,240
we do it on smaller machines as
well, but on bigger machines

278
00:14:39,240 --> 00:14:43,760
we collect just to understand how
the system is working.

279
00:14:43,860 --> 00:14:48,260
Of course, we're moving towards
more useful benchmarks, for example,

280
00:14:48,260 --> 00:14:53,240
when it will be already some specific
database and specific kind

281
00:14:53,240 --> 00:14:56,540
of workload and maybe not only
PgBench and so on.

282
00:14:56,780 --> 00:15:00,560
And all the tooling will be in
place and automation is super

283
00:15:00,720 --> 00:15:05,640
high, so you can just talk to chatbot
and get results visualized

284
00:15:05,740 --> 00:15:09,660
and collected with all details
and you don't miss any point.

285
00:15:09,720 --> 00:15:12,420
So we collect almost 80 artifacts
for each datapoint.

286
00:15:12,740 --> 00:15:13,680
It's huge.

287
00:15:13,860 --> 00:15:18,400
It reflects many years of experience,
I would say, not only mine.

288
00:15:18,680 --> 00:15:22,320
For example, you can say, okay,
let's compare versions 12 to

289
00:15:22,360 --> 00:15:23,400
17 beta 1.

290
00:15:23,400 --> 00:15:24,860
This is what I did as well.

291
00:15:25,120 --> 00:15:30,560
And I confirmed that for regular
PgBench workload, or for Select-only

292
00:15:30,600 --> 00:15:32,760
PgBench workload, no difference.

293
00:15:33,060 --> 00:15:36,760
These versions are stable, no regression,
but also no wins.

294
00:15:37,280 --> 00:15:38,440
But for regular...

295
00:15:38,680 --> 00:15:39,940
Michael: 12 through 17.

296
00:15:40,400 --> 00:15:40,900
Nikolay: Yeah.

297
00:15:42,340 --> 00:15:48,840
But for select-only, this is exactly
like workload which allowed

298
00:15:48,840 --> 00:15:51,180
me to achieve almost 4 million
TPS.

299
00:15:51,400 --> 00:15:54,980
For this, 12 is losing significantly.

300
00:15:55,080 --> 00:15:55,940
Michael: Wait, wait, wait.

301
00:15:56,200 --> 00:15:56,840
You lost me.

302
00:15:56,840 --> 00:15:57,840
So for read-only...

303
00:15:58,860 --> 00:15:59,060
No,

304
00:15:59,060 --> 00:16:00,000
Nikolay: no, for regular.

305
00:16:00,820 --> 00:16:01,740
Michael: Oh, for read-write.

306
00:16:03,400 --> 00:16:08,100
Nikolay: You know, regular transactions,
PagerBench has, It's

307
00:16:08,100 --> 00:16:11,620
like, I don't know, like a couple
of updates, insert, delete,

308
00:16:11,760 --> 00:16:15,480
a couple of selects, and they are
packaged as a single transaction,

309
00:16:15,480 --> 00:16:18,700
and then you just say, okay, let's
just check it.

310
00:16:18,700 --> 00:16:22,500
And the way we check it, it's called
like actually not load testing,

311
00:16:22,500 --> 00:16:24,960
but stress testing, specialized
version of load testing.

312
00:16:24,960 --> 00:16:30,360
And we check in the edge, we say,
okay, with 1 client, how many?

313
00:16:30,480 --> 00:16:31,780
50 clients, how many?

314
00:16:31,780 --> 00:16:32,800
100 clients, how many?

315
00:16:32,800 --> 00:16:37,340
And then you have a graph dependency
of TPS, how TPS depends

316
00:16:37,340 --> 00:16:38,600
on the number of clients.

317
00:16:39,480 --> 00:16:44,140
Yeah, and it's interesting that
we cannot say 0 and due to scale

318
00:16:44,140 --> 00:16:47,940
we need to jump with big steps
like 50.

319
00:16:48,480 --> 00:16:55,160
So I say 1, 50, 100, then dot,
dot, dot, 500.

320
00:16:55,280 --> 00:17:00,800
And the bot understands that it
needs to go with 50 increments,

321
00:17:01,220 --> 00:17:04,260
but first dot is shifted to 1 and
it's okay.

322
00:17:05,020 --> 00:17:08,900
So it's interesting, it adjusts
very quickly.

323
00:17:08,940 --> 00:17:14,560
So, yeah, for fun and this, and
back to versions comparison,

324
00:17:15,120 --> 00:17:19,120
So we found that for select only,
we don't see a difference,

325
00:17:19,120 --> 00:17:22,620
but for regular workload and we
don't get a million transactions

326
00:17:22,740 --> 00:17:28,100
there, we have only like 30,000
maybe, 40,000, something like

327
00:17:28,100 --> 00:17:31,120
quite lower because it's already
writing transactions and so

328
00:17:31,120 --> 00:17:31,620
on.

329
00:17:32,200 --> 00:17:38,220
We saw that 12 is losing apparently,
and 16, 17 are winning slightly

330
00:17:39,440 --> 00:17:41,660
compared to like average for these
old versions.

331
00:17:42,180 --> 00:17:45,720
But yeah, I think we will continue
exploring and maybe test some

332
00:17:45,720 --> 00:17:48,380
specific workloads, not just standard
ones.

333
00:17:48,940 --> 00:17:51,720
Michael: And just to check, this
is like completely untuned,

334
00:17:52,360 --> 00:17:54,220
like just default config.

335
00:17:54,480 --> 00:17:54,840
Okay.

336
00:17:54,840 --> 00:17:55,620
Nikolay: It's tuned.

337
00:17:55,680 --> 00:17:56,620
Good to check.

338
00:17:56,880 --> 00:17:57,380
Yeah.

339
00:17:57,440 --> 00:18:01,620
We use PostgreSQL cluster, it's
Ansible playbooks maintained

340
00:18:01,620 --> 00:18:05,040
by Vitaly, who works with me and
Vitaly Kukharik.

341
00:18:05,740 --> 00:18:09,780
And it's a great project if you
don't like Kubernetes, for example,

342
00:18:10,520 --> 00:18:11,780
and actually containers.

343
00:18:11,880 --> 00:18:17,000
You just need bare Postgres installed
on Debian, Ubuntu or something,

344
00:18:17,380 --> 00:18:21,920
and you want automation for it,
this is a good project.

345
00:18:22,340 --> 00:18:26,200
It goes with everything, like everything
most popular stuff,

346
00:18:26,200 --> 00:18:31,320
like Patroni, pgBackRest or WAL-G, what else?

347
00:18:31,320 --> 00:18:33,080
Like a lot of simple things.

348
00:18:33,080 --> 00:18:36,600
Actually, even with TimescaleDB,
it's packaged.

349
00:18:36,820 --> 00:18:40,760
So you can choose to use it and
quickly have it on your cloud

350
00:18:40,760 --> 00:18:41,260
or...

351
00:18:42,280 --> 00:18:45,120
So for self-managed Postgres, it's
a good option.

352
00:18:45,480 --> 00:18:48,240
It takes time to install, of course,
because it will be running

353
00:18:48,240 --> 00:18:50,520
like apt-get install.

354
00:18:51,220 --> 00:18:53,600
So it's coming with some tuning.

355
00:18:54,100 --> 00:18:58,020
It's similar to what cloud providers
do, like 25% for shared

356
00:18:58,020 --> 00:18:58,520
buffers.

357
00:18:59,020 --> 00:19:05,040
It also adjusts operational system
parameters, like kernel settings.

358
00:19:05,660 --> 00:19:09,780
Michael: I guess the dataset is
so small that 25% still so easily

359
00:19:09,780 --> 00:19:10,580
fits within.

360
00:19:11,240 --> 00:19:12,180
Nikolay: Yeah, of course.

361
00:19:12,740 --> 00:19:16,640
And maybe we can tune additionally,
but I didn't see the point

362
00:19:16,640 --> 00:19:17,140
yet.

363
00:19:17,800 --> 00:19:22,780
This default tuning this project
provides was also like...

364
00:19:23,560 --> 00:19:26,760
I know it quite well because we
adjusted in the past together

365
00:19:26,760 --> 00:19:29,880
a few times, so I know how it works.

366
00:19:30,040 --> 00:19:34,740
So I relied on it for now, but
maybe we will additionally adjust

367
00:19:34,740 --> 00:19:35,280
some things.

368
00:19:35,280 --> 00:19:37,540
But queries are super simple, right?

369
00:19:37,540 --> 00:19:38,500
Data is cached.

370
00:19:38,560 --> 00:19:39,060
So...

371
00:19:39,720 --> 00:19:40,360
Michael: I understand.

372
00:19:40,440 --> 00:19:45,520
I'm just trying to think, like,
the major things, like, all cached,

373
00:19:46,620 --> 00:19:49,560
I guess, basic, like, write-ahead
log stuff, when you've got

374
00:19:49,560 --> 00:19:51,880
the read-write stuff going on.

375
00:19:51,880 --> 00:19:52,480
Doesn't matter.

376
00:19:53,300 --> 00:19:53,800
Yeah.

377
00:19:54,220 --> 00:19:56,860
Nikolay: It doesn't matter because
we don't write.

378
00:19:57,660 --> 00:20:01,280
Michael: So in 1 of them, in the
30, 000, I reckon the 30, 000

379
00:20:01,400 --> 00:20:04,400
is low and you could get much higher
there.

380
00:20:04,940 --> 00:20:08,200
Nikolay: Well, yeah, if we talk
about writing transactions, of

381
00:20:08,200 --> 00:20:11,300
course, we should think about checkpoints
and so on.

382
00:20:11,760 --> 00:20:14,640
And it comes with some adjustments
based on the size of machine

383
00:20:14,640 --> 00:20:15,060
already.

384
00:20:15,060 --> 00:20:17,540
So I just relied on this default
behavior.

385
00:20:17,720 --> 00:20:21,540
We can look further and good thing,
we consider settings as 1

386
00:20:21,540 --> 00:20:22,860
of artifacts we collect.

387
00:20:23,360 --> 00:20:28,040
So for history, we record everything,
all custom settings and

388
00:20:28,040 --> 00:20:29,580
some system info as well.

389
00:20:30,180 --> 00:20:34,200
So we can revisit it later and
think how to improve and next

390
00:20:34,200 --> 00:20:34,700
step.

391
00:20:35,280 --> 00:20:36,960
But bottlenecks are not there yet.

392
00:20:37,440 --> 00:20:37,940
Michael: Yeah.

393
00:20:38,680 --> 00:20:42,340
And I was wondering, last question
I had on the kind of the top

394
00:20:42,340 --> 00:20:43,200
line number.

395
00:20:44,680 --> 00:20:49,460
I get keeping pg_wait_sampling
for being able to introspect,

396
00:20:49,740 --> 00:20:52,540
for being able to kind of try and
work out where is the bottleneck

397
00:20:52,540 --> 00:20:53,040
now.

398
00:20:53,480 --> 00:20:54,660
It's great for that.

399
00:20:54,840 --> 00:20:57,080
But doesn't it also add a small
amount of overhead?

400
00:20:57,080 --> 00:20:59,740
I know it's only sampling, so it's
probably minimal.

401
00:20:59,880 --> 00:21:03,220
But Were you tempted to try without
it as well and just see what

402
00:21:03,480 --> 00:21:04,140
you got?

403
00:21:04,280 --> 00:21:05,340
Nikolay: I think we did.

404
00:21:05,500 --> 00:21:07,080
And I think it's very minimal.

405
00:21:07,200 --> 00:21:08,600
It's worth revisiting as well.

406
00:21:08,600 --> 00:21:11,780
But for now, our impression is
that PageVision sampling among...

407
00:21:11,960 --> 00:21:12,680
We have...

408
00:21:12,700 --> 00:21:16,720
That's why, by the way, we encountered
the problems with pg.kcache,

409
00:21:17,160 --> 00:21:21,820
because we used basically the package
we use for various kinds

410
00:21:21,820 --> 00:21:26,080
of experiments in the case of self-managed
approach, right?

411
00:21:26,580 --> 00:21:29,160
Michael: Had it more automatically
on by default.

412
00:21:29,380 --> 00:21:32,200
Nikolay: Yeah, We already had these
observability tools there,

413
00:21:32,200 --> 00:21:34,940
and this is how we found the pg.stat_kcache
problem.

414
00:21:35,580 --> 00:21:40,580
As for pg.wait_sampling among all
3 extensions, pg.stat_kcache, pg.stat_statements,

415
00:21:40,900 --> 00:21:46,280
and pg.wait_sampling, by the way,
both pg.wait_sampling and pg.stat_kcache

416
00:21:46,900 --> 00:21:50,860
depend on pg.stat_statements because
they basically extend it for

417
00:21:50,860 --> 00:21:51,700
query analysis.

418
00:21:52,080 --> 00:21:53,900
They all provide query analysis.

419
00:21:54,480 --> 00:21:59,820
Regular pg_stat_statements provides
query analysis like regular

420
00:22:00,040 --> 00:22:05,520
database metrics like calls, timing,
IOMetrics at upper level,

421
00:22:05,740 --> 00:22:07,440
buffer pool metrics, right?

422
00:22:07,700 --> 00:22:13,400
Kcache goes down to physical level,
CPU and disk.

423
00:22:13,680 --> 00:22:18,060
And PgWaitSampling for each query,
it also provides a profile

424
00:22:18,740 --> 00:22:21,340
in terms of wait events, which
is super useful.

425
00:22:21,600 --> 00:22:25,240
It's like different angles of analysis.

426
00:22:25,640 --> 00:22:29,280
So if you can have them all free,
it's super good.

427
00:22:29,600 --> 00:22:33,340
Also worth mentioning, it's off-topic
but interesting.

428
00:22:34,060 --> 00:22:38,860
Until very recently, PgWaitSampling
didn't register events

429
00:22:39,440 --> 00:22:44,620
where weight event is null, which
most weight event analysis

430
00:22:44,620 --> 00:22:49,480
we know, such as RDS Performance
Insights, they picture it as

431
00:22:49,480 --> 00:22:51,600
a green area called CPU.

432
00:22:53,560 --> 00:22:58,680
By the way, Alexander Korotkov,
who's originally the author of

433
00:22:58,680 --> 00:23:02,920
PgWaitSampling, confirmed my
idea that it's not fair to name

434
00:23:02,920 --> 00:23:03,420
it CPU.

435
00:23:03,420 --> 00:23:06,600
It should be like CPU or some unknown
weight event which is not

436
00:23:06,600 --> 00:23:07,840
yet implemented.

437
00:23:08,260 --> 00:23:12,280
Because not everything is covered
by a weight event analysis

438
00:23:12,280 --> 00:23:13,360
in the code base.

439
00:23:14,440 --> 00:23:18,340
Until recently, the pre-reward
sampling had a huge problem.

440
00:23:18,340 --> 00:23:24,280
It didn't register nulls, naming
them somehow, at least like

441
00:23:24,280 --> 00:23:25,900
CPU or something, at all.

442
00:23:26,040 --> 00:23:29,760
But guys from CyberTech implemented
it a couple of weeks ago.

443
00:23:30,060 --> 00:23:30,940
It's huge news.

444
00:23:30,940 --> 00:23:32,040
It's good.

445
00:23:32,040 --> 00:23:32,540
Thank you.

446
00:23:32,540 --> 00:23:34,260
Thank you, guys, if you listen
to us.

447
00:23:34,940 --> 00:23:36,640
I'm happy that it's implemented.

448
00:23:36,660 --> 00:23:41,300
And I know GCP Cloud SQL also has
PGWait sampling.

449
00:23:42,180 --> 00:23:46,320
And I like the idea that unlike
in Performance Insights, in RDS,

450
00:23:47,180 --> 00:23:49,900
we have interface to this data.

451
00:23:50,340 --> 00:23:52,280
I mean, good interface with SQL,
right?

452
00:23:52,280 --> 00:23:56,780
We can just query this data right
from our database and build

453
00:23:56,780 --> 00:24:00,780
our own observability extensions
or automation.

454
00:24:01,240 --> 00:24:03,680
For experiments, it's super important
because I want to take

455
00:24:03,680 --> 00:24:06,480
a snapshot and it's easier for
me just to take a snapshot using

456
00:24:06,480 --> 00:24:06,980
SQL.

457
00:24:08,640 --> 00:24:12,400
Let's collect profile, global,
and per query.

458
00:24:13,260 --> 00:24:16,200
We dump it to CSV files basically.

459
00:24:16,500 --> 00:24:17,240
That's it.

460
00:24:17,620 --> 00:24:18,840
For long-term storage.

461
00:24:20,540 --> 00:24:24,840
So we quickly found Kcache, solved
it, and we stuck at 1 million

462
00:24:24,840 --> 00:24:28,920
TPS, which made me sad because
it just repeated the experiment

463
00:24:28,980 --> 00:24:31,080
Alexander had 8 years ago.

464
00:24:31,500 --> 00:24:32,500
So what's next?

465
00:24:32,500 --> 00:24:34,400
Do you know what was next?

466
00:24:34,900 --> 00:24:35,540
Michael: I've cheated.

467
00:24:35,540 --> 00:24:36,980
I've read your post already.

468
00:24:37,640 --> 00:24:38,260
Nikolay: You know.

469
00:24:38,560 --> 00:24:38,960
Michael: Yeah.

470
00:24:38,960 --> 00:24:43,320
And I think the last episode as
well gave it away in February

471
00:24:43,320 --> 00:24:46,320
with page set statements being
next, right?

472
00:24:46,320 --> 00:24:46,820
Yeah.

473
00:24:46,920 --> 00:24:48,560
Nikolay: Huge bottleneck for such
cases.

474
00:24:48,560 --> 00:24:54,920
I consider this edge case because
normally we don't have super

475
00:24:54,920 --> 00:24:59,580
high frequent query, super fast,
super high frequent, and a lot

476
00:24:59,580 --> 00:25:00,720
of CPUs.

477
00:25:00,940 --> 00:25:03,340
But theoretically it can happen.

478
00:25:03,380 --> 00:25:08,160
For example, you have Postgres
and a few replicas, and you need

479
00:25:09,380 --> 00:25:09,520
to...

480
00:25:09,520 --> 00:25:13,480
Like you have some huge table,
maybe not so huge, single index,

481
00:25:13,480 --> 00:25:15,360
and you need just to find records.

482
00:25:15,360 --> 00:25:18,740
Basically, almost like a key value
approach, right?

483
00:25:18,960 --> 00:25:19,700
Maybe partitioned.

484
00:25:21,040 --> 00:25:26,780
And then queries are super fast,
much, like a lot below 1 millisecond.

485
00:25:27,100 --> 00:25:32,120
So in this case, if you have a
query, which after you remove

486
00:25:32,120 --> 00:25:33,780
parameters is the same.

487
00:25:34,540 --> 00:25:36,400
You have a single normalized query.

488
00:25:36,780 --> 00:25:37,720
Very high frequency.

489
00:25:39,000 --> 00:25:40,360
Michael: Yeah, is it how many?

490
00:25:40,760 --> 00:25:46,560
There are probably only 2 or 3
normalized queries in that pgbench.

491
00:25:47,400 --> 00:25:50,280
Nikolay: Well, by the way, I double-checked
for select only,

492
00:25:50,280 --> 00:25:51,660
it's just a single query.

493
00:25:52,060 --> 00:25:52,860
Michael: It's 1.

494
00:25:52,860 --> 00:25:53,800
Okay, wow.

495
00:25:54,640 --> 00:25:59,440
Nikolay: It has also set this macro
backslash set to get random,

496
00:25:59,440 --> 00:26:00,740
but I think it's client-side.

497
00:26:01,320 --> 00:26:03,020
So it's not going to Postgres.

498
00:26:04,000 --> 00:26:08,200
It's only for PgBench itself to
generate a random number.

499
00:26:08,200 --> 00:26:09,600
And I don't think it's a bottleneck.

500
00:26:09,720 --> 00:26:10,560
I hope not.

501
00:26:10,560 --> 00:26:13,740
Oh, by the way, our PgBench clients
were sitting on the same

502
00:26:13,740 --> 00:26:16,520
host as Postgres, which is interesting.

503
00:26:16,860 --> 00:26:19,180
Because they consume CPU as well.

504
00:26:19,180 --> 00:26:22,120
And unfortunately, right now we
don't collect host stats and

505
00:26:22,120 --> 00:26:22,540
so on.

506
00:26:22,540 --> 00:26:24,860
We don't know how much of CPU was
used.

507
00:26:24,860 --> 00:26:26,080
But usually it's noticeable.

508
00:26:26,120 --> 00:26:30,420
It's not like, not 50% usually,
when you do experiments like

509
00:26:30,420 --> 00:26:34,220
that, co-hosting clients and servers.

510
00:26:34,920 --> 00:26:37,860
But I think it's noticeable, I
would say maybe 10-20%.

511
00:26:38,680 --> 00:26:42,180
So if you offload them somehow,
but if you offload them, you

512
00:26:42,180 --> 00:26:43,400
bring a network.

513
00:26:43,900 --> 00:26:44,400
Yeah.

514
00:26:45,040 --> 00:26:47,440
And we will get back to that point.

515
00:26:47,440 --> 00:26:51,680
So, single machine, clients are
on the same machine, eating some

516
00:26:51,680 --> 00:26:52,180
CPU.

517
00:26:52,900 --> 00:26:56,360
And if you have this pattern, just
a single query, normalized

518
00:26:56,360 --> 00:26:58,760
query, pre-resource statements
becomes a huge bottleneck.

519
00:26:59,540 --> 00:27:02,600
Just removing pre-resource statements,
we jumped from 1 million

520
00:27:02,600 --> 00:27:04,300
to 2.5 million TPS.

521
00:27:04,760 --> 00:27:05,460
Michael: Yeah, wow.

522
00:27:06,020 --> 00:27:06,520
Nikolay: Yeah.

523
00:27:07,660 --> 00:27:09,480
And then what's next?

524
00:27:09,480 --> 00:27:11,620
We reached 3 million.

525
00:27:12,540 --> 00:27:13,040
How?

526
00:27:13,580 --> 00:27:14,340
Let's recall.

527
00:27:14,660 --> 00:27:18,420
Then we reached 3 million just
because I forgot I originally

528
00:27:18,420 --> 00:27:19,280
I should use it.

529
00:27:19,280 --> 00:27:21,720
I remember Alexander used it in
2016.

530
00:27:22,740 --> 00:27:24,520
Prepare statements, right?

531
00:27:24,800 --> 00:27:28,940
I forgot to use it and how did
I understand that I forgot?

532
00:27:29,060 --> 00:27:30,140
Looking at flame graphs.

533
00:27:30,140 --> 00:27:31,920
Our bot collects flame graphs.

534
00:27:32,040 --> 00:27:32,640
Ah, nice.

535
00:27:32,640 --> 00:27:32,860
Yeah.

536
00:27:32,860 --> 00:27:38,300
And I just, I was inspecting flame
graphs and I just saw planning

537
00:27:38,300 --> 00:27:38,800
time.

538
00:27:39,140 --> 00:27:41,680
And I'm thinking, oh, we spend
a lot on planning time.

539
00:27:42,160 --> 00:27:42,660
Right?

540
00:27:42,700 --> 00:27:44,440
Let's just get rid of it.

541
00:27:45,060 --> 00:27:51,700
So the right way is just to say
dash capital M hyphen, capital

542
00:27:51,700 --> 00:27:53,740
M prepared for PgBench.

543
00:27:54,520 --> 00:27:57,680
Michael: And it's a perfect workload
for it because it's simple,

544
00:27:57,940 --> 00:28:01,240
same query over and over again,
no variants.

545
00:28:01,620 --> 00:28:02,120
Perfect.

546
00:28:02,140 --> 00:28:02,560
Nikolay: Right.

547
00:28:02,560 --> 00:28:07,800
So also interesting that when you
like, again, step back, when

548
00:28:07,800 --> 00:28:15,220
you see that starting from 150,
100, 150, 200, you go to 500

549
00:28:16,240 --> 00:28:18,420
clients, then you have 360 cores.

550
00:28:19,140 --> 00:28:25,280
And when you see it goes up to
like 100 or 150, and then it's

551
00:28:25,280 --> 00:28:25,780
plateau.

552
00:28:26,820 --> 00:28:30,360
It also gives you a strong feeling
it's not normal, right?

553
00:28:30,360 --> 00:28:35,020
Because it means other Postgres
doesn't scale in terms of number

554
00:28:35,020 --> 00:28:39,600
of parallel clients to number of
cores.

555
00:28:39,600 --> 00:28:42,420
I expected it to scale to number
of cores.

556
00:28:42,500 --> 00:28:45,560
Of course, we have clients running
on the same host, so maybe

557
00:28:46,620 --> 00:28:50,360
the situation will start earlier,
but definitely not at 100 or

558
00:28:50,360 --> 00:28:53,040
150 if you have 360 cores.

559
00:28:53,560 --> 00:28:55,760
So this is how I saw this is not
right.

560
00:28:55,760 --> 00:28:59,280
And it happened with pg_stat_kcache
and then pg_stat_statements.

561
00:29:00,060 --> 00:29:01,520
Situation started earlier.

562
00:29:02,220 --> 00:29:04,460
So yeah, some bottleneck, obviously.

563
00:29:05,060 --> 00:29:08,300
But now the curve looks already
much better.

564
00:29:08,800 --> 00:29:12,640
It's already like we reach maximum
point already close to number,

565
00:29:12,640 --> 00:29:17,540
of course, 300 or 360, maybe 400,
actually, slightly bigger.

566
00:29:18,220 --> 00:29:21,460
So then it doesn't make sense to
increase, but we just check

567
00:29:21,460 --> 00:29:23,420
it, trying to find a plateau.

568
00:29:24,960 --> 00:29:28,940
So prepare statements gave us more
than an additional half a

569
00:29:28,940 --> 00:29:29,840
million of TPS.

570
00:29:30,860 --> 00:29:32,660
Michael: Oh, I expected more, actually.

571
00:29:32,840 --> 00:29:34,020
I was...

572
00:29:34,200 --> 00:29:37,520
Only because, you know, when you're
looking at, like, super fast

573
00:29:37,660 --> 00:29:42,000
primary key lookups, quite often
planning time is more than execution

574
00:29:42,100 --> 00:29:44,360
time in a simple EXPLAIN ANALYZE.

575
00:29:44,440 --> 00:29:45,360
Nikolay: Interesting, yeah.

576
00:29:45,740 --> 00:29:48,120
Michael: Yeah, so it's just interesting
to me that it wasn't

577
00:29:48,120 --> 00:29:51,100
like double, but yeah, half a million
was nothing to look down

578
00:29:51,100 --> 00:29:52,100
our noses at.

579
00:29:52,660 --> 00:29:52,940
Nikolay: Yeah.

580
00:29:52,940 --> 00:29:53,470
Well, interesting.

581
00:29:53,470 --> 00:29:57,060
Maybe I should think about it and
explore additionally, but this

582
00:29:57,060 --> 00:29:58,280
is how it is.

583
00:29:58,380 --> 00:30:02,080
Michael: Did you do force generic
plan as well?

584
00:30:02,100 --> 00:30:03,020
Nikolay: No, not yet.

585
00:30:03,020 --> 00:30:03,400
But mind

586
00:30:03,400 --> 00:30:05,740
Michael: you, that would only,
that would only like be 5.

587
00:30:05,740 --> 00:30:07,560
Yeah, no, that probably won't help.

588
00:30:07,700 --> 00:30:11,040
Nikolay: Here's the thing, it's
in my to-do, maybe for this week

589
00:30:11,040 --> 00:30:12,340
or next, when I have time.

590
00:30:12,340 --> 00:30:13,040
I don't have time.

591
00:30:13,040 --> 00:30:15,780
This is, again, this is a side
project just for fun and understanding

592
00:30:16,160 --> 00:30:18,180
general behavior of Postgres.

593
00:30:18,900 --> 00:30:24,960
What I remember about this generic
plan, Forrest, is that Jeremy

594
00:30:24,960 --> 00:30:31,820
Schneider posted a very good overview
of problems others had

595
00:30:32,040 --> 00:30:34,580
with log manager recently, right?

596
00:30:34,840 --> 00:30:41,900
And I remember some benchmark,
some specific case someone showed

597
00:30:42,340 --> 00:30:43,160
on Twitter.

598
00:30:43,460 --> 00:30:47,680
If you just tell pgbench, If you
adjust user, for example, alter

599
00:30:47,680 --> 00:30:53,100
user, and set force plan cache
mode, this force generic plan,

600
00:30:53,720 --> 00:30:56,500
it leads to huge degradation in
terms of TPS.

601
00:30:57,660 --> 00:30:59,080
So I need to check it.

602
00:31:00,600 --> 00:31:06,140
And lightweight lock manager pops
up in top of wait events.

603
00:31:06,260 --> 00:31:09,060
So I need to double check it because
I don't understand the nature

604
00:31:09,060 --> 00:31:09,440
of it.

605
00:31:09,440 --> 00:31:10,340
It's super interesting.

606
00:31:11,600 --> 00:31:14,300
Michael: I realized soon after
saying it that the reason I thought

607
00:31:14,300 --> 00:31:17,120
it would help is not going to help
here because you've got the

608
00:31:17,120 --> 00:31:21,180
same query over and over, not lots
of little different queries.

609
00:31:21,760 --> 00:31:23,940
Nikolay: But that should be cached
for sooner, right?

610
00:31:24,320 --> 00:31:27,500
Michael: Yeah, but like after 5
executions, which is probably

611
00:31:27,500 --> 00:31:30,760
like half a millisecond total.

612
00:31:30,900 --> 00:31:32,580
Nikolay: Yeah, it should be no
difference in my opinion, but

613
00:31:32,580 --> 00:31:33,280
no degradation.

614
00:31:33,400 --> 00:31:34,120
Why degradation?

615
00:31:35,660 --> 00:31:38,160
It's something interesting to check,
or maybe I just misunderstood.

616
00:31:38,500 --> 00:31:40,620
This is my to-do to clarify.

617
00:31:41,760 --> 00:31:46,080
So I guess it's interesting, exploring
postgres behavior on the

618
00:31:46,080 --> 00:31:46,580
edge.

619
00:31:47,040 --> 00:31:48,420
So let's just recap.

620
00:31:48,900 --> 00:31:52,840
PgStart.kcache, we removed it,
also fixed, but did we get it

621
00:31:52,840 --> 00:31:53,140
back?

622
00:31:53,140 --> 00:31:55,680
We should get it back, actually,
because now it's much better.

623
00:31:55,680 --> 00:31:59,320
They just removed some log, and
it's good again.

624
00:31:59,540 --> 00:32:01,280
Then we found PgStart statements.

625
00:32:02,200 --> 00:32:03,460
We achieved 1000000.

626
00:32:04,020 --> 00:32:05,460
We removed PgStart statements.

627
00:32:05,800 --> 00:32:06,920
This is the key.

628
00:32:07,480 --> 00:32:12,160
By the way, when I say we keep
only pg_wait_sampling, but it

629
00:32:12,160 --> 00:32:17,380
also depends on PgStart statements,
I also feel some inconsistency

630
00:32:18,260 --> 00:32:18,960
in my...

631
00:32:19,280 --> 00:32:20,240
Michael: I've looked it up.

632
00:32:20,240 --> 00:32:21,840
When you said it, I got confused.

633
00:32:21,900 --> 00:32:23,080
It doesn't depend on it.

634
00:32:23,080 --> 00:32:24,780
It's just more stuff is possible.

635
00:32:25,440 --> 00:32:26,820
Nikolay: It's optional, right?

636
00:32:26,980 --> 00:32:27,480
Yeah.

637
00:32:27,600 --> 00:32:31,200
It can run alone, as we do right
now in these experiments.

638
00:32:31,260 --> 00:32:35,860
But if pg_buffercache is present,
we can join data using query

639
00:32:35,860 --> 00:32:36,680
ID, right?

640
00:32:36,880 --> 00:32:37,380
Michael: Exactly.

641
00:32:37,740 --> 00:32:40,940
Nikolay: This is a good correction,
because I said the wrong

642
00:32:40,940 --> 00:32:41,440
thing.

643
00:32:41,720 --> 00:32:44,960
But pg_stat_kcache does depend on
pg_stat_kcache, you cannot install

644
00:32:44,960 --> 00:32:46,620
it without it, this is for sure.

645
00:32:47,120 --> 00:32:47,620
Okay.

646
00:32:48,540 --> 00:32:52,000
And yeah, and these guys are not
available in cloud environments

647
00:32:52,080 --> 00:32:52,580
usually.

648
00:32:53,040 --> 00:32:55,080
PgWord sampling is in CloudSQL.

649
00:32:55,080 --> 00:32:56,920
I mean, not in cloud, in managed
Postgres.

650
00:32:57,280 --> 00:33:01,500
PgWord sampling is available on
CloudSQL, but that's it.

651
00:33:01,560 --> 00:33:06,100
But maybe after this podcast, people
will consider adding it,

652
00:33:06,420 --> 00:33:10,220
especially if pg_wait_sampling,
which is really great, especially

653
00:33:10,240 --> 00:33:13,220
now with this fix from Cybertech.

654
00:33:15,180 --> 00:33:19,080
So we removed pg_stat_statements, reached
2.5 million.

655
00:33:20,240 --> 00:33:23,600
We added prepared statements, exceeded
3 million.

656
00:33:24,960 --> 00:33:28,520
And then final step, how I almost
approached 4 million.

657
00:33:28,520 --> 00:33:32,700
It's like 3.75 million TPS.

658
00:33:32,780 --> 00:33:33,140
How?

659
00:33:33,140 --> 00:33:34,340
Like Last optimization.

660
00:33:35,740 --> 00:33:37,860
It's actually not super fair optimization.

661
00:33:38,400 --> 00:33:41,740
My last optimization was, let's
just switch from TCP connection

662
00:33:41,740 --> 00:33:45,580
to Unix domain socket connection,
which is possible in limited

663
00:33:45,900 --> 00:33:50,280
cases, because it works only when
you're on the same node.

664
00:33:50,280 --> 00:33:52,160
You should exclude network completely.

665
00:33:52,420 --> 00:33:58,780
And of course, this is obvious,
TCP/IP connections are heavier,

666
00:33:59,120 --> 00:34:04,420
definitely, like more overhead
than just when processes talk

667
00:34:04,660 --> 00:34:06,180
through Unix domain socket.

668
00:34:06,180 --> 00:34:08,140
It's much more lightweight.

669
00:34:08,800 --> 00:34:15,200
And this allowed me to jump from
3 something to 3.75 million

670
00:34:15,220 --> 00:34:17,460
TPS, which is good.

671
00:34:17,780 --> 00:34:22,420
And also I found that 17 works
worse on Unix domain socket than

672
00:34:22,420 --> 00:34:22,920
16.

673
00:34:23,480 --> 00:34:25,180
Yeah, this is also in my to-do.

674
00:34:25,180 --> 00:34:25,680
Why?

675
00:34:26,140 --> 00:34:28,520
For TCP connections, no difference.

676
00:34:29,240 --> 00:34:31,980
It's on this edge case, like select
only.

677
00:34:32,220 --> 00:34:35,800
But for Unix domain socket, I
see degradation and it's worth

678
00:34:35,800 --> 00:34:36,780
exploring why.

679
00:34:37,360 --> 00:34:40,060
So it's maybe I found some degradation
for 17.

680
00:34:40,280 --> 00:34:41,084
I also need to double check.

681
00:34:41,084 --> 00:34:42,040
Michael: Quite a big difference.

682
00:34:42,980 --> 00:34:43,140
Yeah.

683
00:34:43,140 --> 00:34:43,940
Oh, no, no, no.

684
00:34:43,940 --> 00:34:44,660
So sorry.

685
00:34:45,540 --> 00:34:49,780
Nikolay: 7 to 8% as I remember
on higher number of clients.

686
00:34:50,600 --> 00:34:51,360
Michael: Yeah, sure.

687
00:34:51,420 --> 00:34:52,040
Sure, sure, sure.

688
00:34:52,040 --> 00:34:56,700
I made the mistake of looking at
17 beta chart and looking at

689
00:34:56,840 --> 00:35:00,560
TCP versus Unix, not 17 versus
16.

690
00:35:00,560 --> 00:35:00,720
By

691
00:35:00,720 --> 00:35:06,540
Nikolay: the way, when you have
a long chat with bot and different

692
00:35:06,540 --> 00:35:10,200
series of experiments, then you
want to cross-compare something,

693
00:35:10,200 --> 00:35:11,240
it's super easy.

694
00:35:13,380 --> 00:35:17,340
Just visualize this and that on
the same graph, that's it.

695
00:35:17,800 --> 00:35:20,780
Michael: Much easier than getting
it to give you the correct

696
00:35:20,780 --> 00:35:22,860
JSON I saw from the chat transcript.

697
00:35:23,840 --> 00:35:25,460
Nikolay: Well, yeah, it depends.

698
00:35:25,460 --> 00:35:27,900
And Gemini is better in JSON than
GPT.

699
00:35:27,900 --> 00:35:29,620
Well, we have a lot of fun stories.

700
00:35:29,620 --> 00:35:31,360
So Let's not go there.

701
00:35:31,360 --> 00:35:33,220
But it happens.

702
00:35:34,940 --> 00:35:41,400
So, yeah, almost 4 million TPS
for quite meaningful workloads.

703
00:35:41,420 --> 00:35:45,320
I think it's worth checking bigger
tables, for example, to see

704
00:35:45,380 --> 00:35:46,280
how it depends.

705
00:35:46,280 --> 00:35:50,140
For example, we can just take like
300 clients and then check,

706
00:35:51,200 --> 00:35:52,860
draw different picture.

707
00:35:53,080 --> 00:35:56,520
It's in my mind, like for example,
let's take very small table,

708
00:35:56,520 --> 00:35:58,580
bigger, bigger, bigger and huge,
right?

709
00:35:58,580 --> 00:36:01,020
And how it will degrade, for example,
right?

710
00:36:01,020 --> 00:36:03,340
Michael: When you say bigger table,
do you mean like 100 million

711
00:36:03,340 --> 00:36:04,860
rows instead of 10 million rows?

712
00:36:04,860 --> 00:36:06,340
Nikolay: 100 million rows, yeah.

713
00:36:06,340 --> 00:36:09,640
200 million rows, half a billion
rows, billion rows, partition,

714
00:36:09,720 --> 00:36:10,400
not partition.

715
00:36:10,400 --> 00:36:13,260
Like it's many, it's like a huge
maze.

716
00:36:13,260 --> 00:36:18,840
You can go in many directions and
turn and then have like dead

717
00:36:18,840 --> 00:36:21,820
end maybe, and you turn and inspect
another.

718
00:36:21,820 --> 00:36:23,180
Like it's cool.

719
00:36:23,320 --> 00:36:25,760
The cool thing is that I do it
like anywhere.

720
00:36:25,760 --> 00:36:28,580
Like I don't know, like I'm eating
breakfast and checking what's

721
00:36:28,580 --> 00:36:30,120
up with our experiments.

722
00:36:30,480 --> 00:36:31,780
Let's do something else.

723
00:36:32,040 --> 00:36:36,060
It's just my current hobby, you
know, to explore.

724
00:36:36,900 --> 00:36:39,840
I think we should have more tools
like Sysbench, for example,

725
00:36:39,840 --> 00:36:44,560
to have different kinds of workloads
to be brought up to this.

726
00:36:45,180 --> 00:36:46,420
So yeah, that's it.

727
00:36:46,460 --> 00:36:51,900
I'm very curious for select-only
PgBench, anyone had more?

728
00:36:52,440 --> 00:36:56,400
But now, I was thinking, oh, maybe
it's the biggest number.

729
00:36:56,400 --> 00:37:00,480
It's definitely the biggest I ever
saw, but I didn't see everything.

730
00:37:00,480 --> 00:37:04,020
Maybe someone has bigger numbers
or saw bigger numbers or like

731
00:37:04,020 --> 00:37:05,100
totally pg-bitch.

732
00:37:06,740 --> 00:37:08,680
Michael: If they have, I couldn't
find it.

733
00:37:08,680 --> 00:37:12,100
I did do some looking before the
episode just to try and see

734
00:37:12,400 --> 00:37:13,880
if anyone pushed it further.

735
00:37:15,940 --> 00:37:18,480
If it's been published, I've struggled
to find it.

736
00:37:18,480 --> 00:37:22,100
Nikolay: But it's definitely possible
on these huge machines

737
00:37:22,120 --> 00:37:23,660
on AWS, right?

738
00:37:23,740 --> 00:37:27,500
Because you have more vCPUs and
as we see Postgres scales quite

739
00:37:27,500 --> 00:37:31,140
well in terms of number of clients
and number of CPUs.

740
00:37:31,720 --> 00:37:32,780
Michael: That's the direction.

741
00:37:32,800 --> 00:37:36,040
If you're talking about that maze,
the way I would be tempted

742
00:37:36,040 --> 00:37:40,020
to go with this is with the currently
available cloud-provided

743
00:37:41,000 --> 00:37:43,660
VMs, what's the most TPS I can
get?

744
00:37:43,660 --> 00:37:44,980
I don't care about cheating.

745
00:37:45,040 --> 00:37:47,460
I don't care if it's only select
semicolon.

746
00:37:48,340 --> 00:37:50,860
And I just would love to know,
what is that number?

747
00:37:50,860 --> 00:37:54,100
Is it, I'm guessing it's above
10 million, but is it 12?

748
00:37:54,140 --> 00:37:54,940
Is it 15?

749
00:37:55,240 --> 00:37:58,140
Like, how many could we, like,
that would be super interesting.

750
00:37:58,140 --> 00:37:58,260
How

751
00:37:58,260 --> 00:37:58,940
many millions?

752
00:37:59,340 --> 00:38:00,940
Nikolay: Actually, it's another good point.

753
00:38:00,940 --> 00:38:04,260
By the way, I forgot to mention
that the problem with pg_stat_statements

754
00:38:04,640 --> 00:38:05,580
statements identified.

755
00:38:06,040 --> 00:38:09,760
Again, if you have a select-only
key-value approach and you need

756
00:38:09,760 --> 00:38:15,060
to squeeze more, pg_stat_statements
can drop performance 3 times

757
00:38:15,060 --> 00:38:16,160
here, basically.

758
00:38:16,320 --> 00:38:18,100
So it's worth removing it.

759
00:38:18,240 --> 00:38:20,040
And of course, there are already
discussions.

760
00:38:20,200 --> 00:38:24,840
Let's mention there is a new discussion
started recently, I think,

761
00:38:24,840 --> 00:38:26,540
by Michael Pacquere.

762
00:38:26,920 --> 00:38:29,180
Sorry, I pronounce all this wrong.

763
00:38:29,440 --> 00:38:32,580
How to optimize, what to do with
producer statements in general,

764
00:38:33,220 --> 00:38:36,360
and he reacted on Twitter on my
benchmark.

765
00:38:37,040 --> 00:38:40,480
So definitely there is an opportunity
here to improve producer

766
00:38:40,480 --> 00:38:40,980
statements.

767
00:38:41,040 --> 00:38:44,660
1 idea was start sampling at some
point.

768
00:38:45,060 --> 00:38:46,320
Michael: Sampling is the obvious
1.

769
00:38:46,320 --> 00:38:52,520
Nikolay: Also, like, deal with
high contention case differently,

770
00:38:52,720 --> 00:38:55,580
and how then, like, update less.

771
00:38:57,200 --> 00:38:58,040
There are ideas.

772
00:38:58,140 --> 00:39:01,880
Michael: I would hate for the takeaway
here to be that there's

773
00:39:01,880 --> 00:39:06,900
a real overhead here of pg_stat_statements, mostly because this

774
00:39:06,900 --> 00:39:09,140
is so much the extreme version,
right?

775
00:39:09,140 --> 00:39:11,260
This is the same query being worked

776
00:39:11,260 --> 00:39:11,760
Nikolay: on.

777
00:39:12,260 --> 00:39:13,380
I put it everywhere.

778
00:39:13,380 --> 00:39:15,220
Disclaimer, this is edge case.

779
00:39:15,240 --> 00:39:17,520
On production, you probably don't
see it.

780
00:39:17,780 --> 00:39:19,400
With your workload, you won't see
it.

781
00:39:19,400 --> 00:39:23,900
With your workload, probably it's
like a few percent only overhead.

782
00:39:24,440 --> 00:39:28,920
But in some cases, when your workload
is just 1 or a few normalized

783
00:39:28,940 --> 00:39:31,880
queries and the frequency is super
high, you have a lot of VCPUs

784
00:39:31,880 --> 00:39:35,100
and they start competing, this
is a high contention situation.

785
00:39:35,660 --> 00:39:37,900
In this case, just check it.

786
00:39:39,600 --> 00:39:45,300
In terms of wait event analysis,
PG stat statements didn't show

787
00:39:45,300 --> 00:39:50,160
up in wait event analysis until
I think Postgres 16, because

788
00:39:50,800 --> 00:39:55,620
basically it was not covered by
wait events, I think.

789
00:39:55,840 --> 00:39:56,340
Interesting.

790
00:39:56,820 --> 00:40:00,760
Maybe I remember incorrectly, but
definitely it was not covered

791
00:40:00,760 --> 00:40:03,340
until some version, and then it
started to be covered.

792
00:40:03,840 --> 00:40:06,760
So you can see it, if you have
a very fresh version of Postgres,

793
00:40:06,760 --> 00:40:10,940
you can see it in, for example,
RDS performance insights.

794
00:40:12,240 --> 00:40:17,000
In this case, it's worth just removing
it, but I think some guys

795
00:40:17,020 --> 00:40:18,460
have such kind of workload.

796
00:40:18,640 --> 00:40:19,740
It's not for everyone.

797
00:40:19,780 --> 00:40:21,560
Quite rare, let's say.

798
00:40:21,560 --> 00:40:22,620
It's quite rare.

799
00:40:23,100 --> 00:40:26,120
But it's valid, key value, right?

800
00:40:26,120 --> 00:40:26,780
Just selects.

801
00:40:26,780 --> 00:40:30,240
You need to serve a lot of, like,
you need to respond very quickly

802
00:40:30,240 --> 00:40:35,440
using just B-tree, index search, just
single row, super fast query,

803
00:40:35,440 --> 00:40:36,700
and then you have a lot.

804
00:40:37,440 --> 00:40:40,100
So just removing pg_stat_statements,
you can get more.

805
00:40:40,680 --> 00:40:44,440
So this leads to the question of
how to optimize pg_stat_statements.

806
00:40:44,440 --> 00:40:46,620
There is an interesting discussion
in Hacker News.

807
00:40:47,580 --> 00:40:50,780
Let's just advertise it a little
bit for those who are interested.

808
00:40:51,420 --> 00:40:52,200
Michael: Yeah, for sure.

809
00:40:52,200 --> 00:40:55,260
Nikolay: And another point I wanted
to mention, you started this

810
00:40:55,260 --> 00:40:56,740
discussion on various clouds.

811
00:40:57,740 --> 00:41:01,160
Our bot knows already about price,
about cost, so we could potentially

812
00:41:01,400 --> 00:41:08,140
check different machines on GCP
and bring also AWS and start

813
00:41:08,140 --> 00:41:12,260
discovering, like, I want 1 million
TPS, just select where it's

814
00:41:12,260 --> 00:41:17,540
cheaper, which machine, like ARM,
Intel, AMD, let's compare different

815
00:41:17,540 --> 00:41:19,340
clouds, different machines, where?

816
00:41:19,540 --> 00:41:20,900
It's interesting, right?

817
00:41:20,900 --> 00:41:22,960
I mean, performance cost.

818
00:41:22,960 --> 00:41:23,600
Michael: I think so.

819
00:41:23,600 --> 00:41:26,176
I liked your transactions per second
per dollar.

820
00:41:26,176 --> 00:41:28,040
TPS per dollar, I reckon you should go

821
00:41:28,040 --> 00:41:28,520
Nikolay: with.

822
00:41:28,520 --> 00:41:31,560
Starting from very simple trivial
workload like these selects

823
00:41:31,560 --> 00:41:35,220
and then maybe to extend it to
closer to what you have and then

824
00:41:35,220 --> 00:41:37,480
you can decide which machine is
better for you.

825
00:41:37,800 --> 00:41:40,080
Michael: This is something I've
actually been struggling with.

826
00:41:40,080 --> 00:41:44,860
I think it could be really useful
is how do you let someone know

827
00:41:44,860 --> 00:41:48,520
that their workload could be done
on the next instance size down,

828
00:41:48,520 --> 00:41:50,580
you know, with maybe through a
bit of optimization.

829
00:41:50,580 --> 00:41:53,540
But if you give people if people
have a rough idea of their workload,

830
00:41:54,180 --> 00:41:55,980
maybe they know it's all OLTP.

831
00:41:56,820 --> 00:42:00,560
And they know roughly like the
average transactions per second,

832
00:42:00,780 --> 00:42:03,120
what size instance should they
roughly be on?

833
00:42:03,120 --> 00:42:05,140
Like just as a rule of thumb.

834
00:42:05,460 --> 00:42:07,100
Nikolay: This is called capacity
planning.

835
00:42:07,900 --> 00:42:10,760
Michael: I know, but also I think
so many people are so over

836
00:42:10,760 --> 00:42:12,700
provisioned because they're so
under tuned.

837
00:42:12,700 --> 00:42:14,440
Nikolay: Many people are under
provisioned.

838
00:42:14,500 --> 00:42:15,480
Michael: For example, I have

839
00:42:15,480 --> 00:42:20,160
Nikolay: cases when people like
when, yeah, the people already

840
00:42:20,300 --> 00:42:24,880
hit the ceiling for specific kind
of platform, for example, for

841
00:42:24,880 --> 00:42:28,180
example, Graviton on AWS or Yandex,
for example, and you have

842
00:42:28,180 --> 00:42:29,560
already the biggest size.

843
00:42:30,640 --> 00:42:35,560
Maybe it's time to, I don't know,
like to scale somehow or to

844
00:42:35,560 --> 00:42:38,820
take different kinds of machines
because you see how big Intel

845
00:42:39,020 --> 00:42:39,520
you have.

846
00:42:39,520 --> 00:42:41,820
But it's not available in RDS yet.

847
00:42:43,080 --> 00:42:47,120
So I see cases both under-provisioned
and maybe you're right,

848
00:42:47,120 --> 00:42:48,280
it's more common.

849
00:42:48,480 --> 00:42:51,760
And over-provisioned is more common.

850
00:42:51,780 --> 00:42:56,680
When you have CPU, like 5% maximum,
and you pay for everything.

851
00:42:56,680 --> 00:42:57,660
else, like why?

852
00:42:57,980 --> 00:43:03,120
But I also see cases when it's
already hitting ceilings and it's

853
00:43:03,120 --> 00:43:05,840
time to do something about it,
right?

854
00:43:05,860 --> 00:43:06,480
Of course.

855
00:43:06,820 --> 00:43:09,880
Maybe I see them because people
go to us for consulting and help,

856
00:43:09,880 --> 00:43:10,380
right?

857
00:43:11,120 --> 00:43:16,120
So they don't go for, well, sometimes
we have this, like, let's

858
00:43:16,120 --> 00:43:18,040
optimize our spendings, right?

859
00:43:18,040 --> 00:43:22,560
But it's like, we're mostly useful
when you have problems with

860
00:43:22,560 --> 00:43:25,940
performance and how to grow further.

861
00:43:26,720 --> 00:43:28,260
Michael: Yeah, well, I see the
opposite.

862
00:43:28,260 --> 00:43:32,880
I think not because they come to
me, or they come to us, it's

863
00:43:32,880 --> 00:43:37,380
more that I know friends, like
founder friends and things, quite

864
00:43:37,380 --> 00:43:41,880
often the cheapest solution in
the early days is double the instance

865
00:43:41,880 --> 00:43:42,180
size.

866
00:43:42,180 --> 00:43:46,960
You know, if you're going from
$30 to $60 to $120, there's not

867
00:43:46,960 --> 00:43:50,880
many engineering hours that can
go into fixing that.

868
00:43:51,680 --> 00:43:54,940
You might as well just upgrade
and then worry about it later.

869
00:43:55,920 --> 00:43:56,900
For example, you

870
00:43:57,720 --> 00:44:00,940
Nikolay: became a big fan of Graviton,
arm, right?

871
00:44:01,260 --> 00:44:02,900
And you decide, oh, it's great.

872
00:44:02,900 --> 00:44:05,640
And it's like checked performance
costs, great.

873
00:44:06,100 --> 00:44:10,440
But maybe it's not like it was
good a year ago, but maybe not

874
00:44:10,440 --> 00:44:10,760
now.

875
00:44:10,760 --> 00:44:12,300
And things change very quickly.

876
00:44:12,540 --> 00:44:17,120
And different nodes, instances,
and types of instances are added

877
00:44:17,120 --> 00:44:18,380
in the cloud very quickly.

878
00:44:18,740 --> 00:44:23,700
So in an ideal world, we should
have some rule how to find the

879
00:44:23,700 --> 00:44:29,180
best optimal choice for instance,
family, and size.

880
00:44:29,440 --> 00:44:34,660
And also we should have great migration
automation to switch.

881
00:44:36,040 --> 00:44:37,060
0 downtime, right?

882
00:44:37,060 --> 00:44:38,040
Michael: That's the other thing.

883
00:44:38,940 --> 00:44:40,940
I think people are scared of going
back down.

884
00:44:40,940 --> 00:44:42,540
Nikolay: Test, test and then switch.

885
00:44:43,360 --> 00:44:46,320
Everything is possible, but requires
some effort, unfortunately.

886
00:44:47,520 --> 00:44:48,500
So, good.

887
00:44:49,940 --> 00:44:51,360
4000000 TPS, almost.

888
00:44:51,660 --> 00:44:52,280
Michael: Almost, yeah.

889
00:44:52,280 --> 00:44:53,100
Or 6 million.

890
00:44:53,100 --> 00:44:54,780
I'm going with 6 million TPS.

891
00:44:55,920 --> 00:44:59,220
Nikolay: And maybe someone already
has this machine with almost

892
00:44:59,220 --> 00:45:03,220
900 vCPUs, and they can show us
10 million or something.

893
00:45:03,660 --> 00:45:04,860
Michael: I'd expect more.

894
00:45:06,200 --> 00:45:06,540
Nikolay: Maybe.

895
00:45:06,540 --> 00:45:10,180
10 million TPS next milestone for
Postgres on a single node.

896
00:45:10,920 --> 00:45:11,420
Great.

897
00:45:11,820 --> 00:45:15,040
By the way, final note, I tried
to find information about MongoDB.

898
00:45:15,360 --> 00:45:20,140
I asked guys, I found only slightly
outdated benchmarks, like

899
00:45:20,140 --> 00:45:24,860
3 years old or so, and I didn't
see anything above, like from

900
00:45:24,860 --> 00:45:25,820
a single node.

901
00:45:26,180 --> 00:45:30,200
I even didn't care about if they
have transaction mode enabled

902
00:45:30,200 --> 00:45:34,540
or not, because they can cheat
here, right?

903
00:45:34,760 --> 00:45:39,140
But I didn't see 4 million, I saw
only 2 million, 2.5 million

904
00:45:39,140 --> 00:45:40,080
or something TPS.

905
00:45:40,080 --> 00:45:40,820
That's it.

906
00:45:41,340 --> 00:45:43,520
So I think Postgres is very competitive.

907
00:45:43,520 --> 00:45:44,740
Michael: It's a nice area.

908
00:45:45,280 --> 00:45:45,540
Nikolay: Yeah.

909
00:45:45,540 --> 00:45:46,020
Okay.

910
00:45:46,020 --> 00:45:46,100
Michael: Good.

911
00:45:46,100 --> 00:45:46,600
Awesome.

912
00:45:47,220 --> 00:45:48,060
Cheers, Nikolay.