1
0:0:0,06 --> 0:0:2,2199998
Nikolay: Hello, hello, this is
PostgresFM.

2
0:0:2,44 --> 0:0:5,7799997
I don't remember the number of
episodes, Michael, do you remember?

3
0:0:6,04 --> 0:0:8,3
Michael: No, we must be getting
close to 170 though.

4
0:0:8,3 --> 0:0:10,62
Nikolay: Yeah, some really big
number.

5
0:0:10,9 --> 0:0:14,24
Not as big as your max_connections,
right?

6
0:0:16,6 --> 0:0:17,96
Michael: I like it, good segue.

7
0:0:18,14 --> 0:0:20,94
Nikolay: Yeah, my name is Nik,
PostgresAI, and as usual here

8
0:0:20,94 --> 0:0:22,18
is Michael, pgMustard.

9
0:0:22,54 --> 0:0:23,26
Hi Michael.

10
0:0:23,74 --> 0:0:24,64
Michael: Hello Nik.

11
0:0:25,32 --> 0:0:30,32
Nikolay: So we are going to discuss
really high max_connections

12
0:0:31,5 --> 0:0:34,739998
in general and maybe we will touch
a little bit connection pooling

13
0:0:34,739998 --> 0:0:40,7
systems a little bit, and maybe
we will touch per user connections,

14
0:0:40,88 --> 0:0:42,94
max_connections setting because
it's possible.

15
0:0:44,059998 --> 0:0:44,559998
Yeah.

16
0:0:44,559998 --> 0:0:50,02
So in general, like is it bad idea
to raise max_connections And

17
0:0:50,02 --> 0:0:52,78
is it a good mitigation action
when you have problems?

18
0:0:54,72 --> 0:0:55,760002
This kind of action.

19
0:0:55,92 --> 0:0:57,84
What's the ideal max_connections?

20
0:0:59,059998 --> 0:1:2,14
Michael: And why do you, I think
you've seen some cases recently

21
0:1:2,16 --> 0:1:4,84
where people are raising it and
you think it's not a good idea.

22
0:1:4,84 --> 0:1:8,54
So is it worth kind of jumping
straight into some recent experiences?

23
0:1:8,56 --> 0:1:10,54
Like why is this on your mind at
the moment?

24
0:1:10,68 --> 0:1:14,74
Nikolay: Yeah, so I proposed this
topic after a tweet I saw where

25
0:1:15,06 --> 0:1:19,54
It was shared that there was a
pattern when deployments, I guess,

26
0:1:20,74 --> 0:1:25,94
schema migrations or DDL were failing,
but only in particular

27
0:1:26,18 --> 0:1:30,98
hours of every working day, or
maybe every day overall.

28
0:1:31,58 --> 0:1:36,1
And then they realized that at
the same time, during a couple

29
0:1:36,1 --> 0:1:39,26
of hours, some data pipelines were
running.

30
0:1:40,84 --> 0:1:44,8
And they just raised max_connections
and that's it.

31
0:1:44,8 --> 0:1:47,36
Actually, I'm not sure max_connections
because it doesn't say

32
0:1:47,36 --> 0:1:49,1
which database system it is.

33
0:1:49,2 --> 0:1:52,68
I just assumed it's Postgres because
Postgres is default choice

34
0:1:52,68 --> 0:1:53,42
right now.

35
0:1:53,42 --> 0:1:54,34
I might be wrong.

36
0:1:54,34 --> 0:1:55,7
Maybe it's different databases.

37
0:1:56,32 --> 0:2:1,62
Michael: I took a note and it said
the solution was, 1, separate

38
0:2:1,62 --> 0:2:5,44
connection pool for migrations,
and 2, better cross-team communication.

39
0:2:6,18 --> 0:2:7,38
Which I thought was interesting.

40
0:2:7,48 --> 0:2:10,64
But yeah, maybe separate connection
pool for migrations is where

41
0:2:10,64 --> 0:2:13,52
you're drawing that conclusion
from.

42
0:2:13,82 --> 0:2:17,2
Nikolay: Well, this doesn't sound
as a mitigation for me.

43
0:2:17,2 --> 0:2:20,78
It sounds like isolation of problem.

44
0:2:21,46 --> 0:2:24,06
Yeah, well, I might be wrong.

45
0:2:24,06 --> 0:2:28,04
So again, like there are several
possible options here, but I

46
0:2:28,04 --> 0:2:33,04
bet what happened, These pipelines
produced long running transactions,

47
0:2:34,06 --> 0:2:34,56
right?

48
0:2:35,28 --> 0:2:39,64
And these long running transactions
blocked some DDLs, ALTER TABLE,

49
0:2:39,64 --> 0:2:41,7
add column, for example.

50
0:2:42,34 --> 0:2:48,1
And this DDL blocked any queries
to this table.

51
0:2:49,16 --> 0:2:52,78
And a number of active sessions
spiked and achieved, reached

52
0:2:53,04 --> 0:2:53,82
max_connections.

53
0:2:54,64 --> 0:2:59,92
If you start isolating, maybe some,
actually if you start isolating,

54
0:3:0,56 --> 0:3:2,46
everyone will suffer anyway, right?

55
0:3:2,98 --> 0:3:8,84
So this idea that somehow isolation
helped somehow, like if this

56
0:3:8,84 --> 0:3:15,54
theory is right, then separate
connection pooler for migration

57
0:3:15,54 --> 0:3:17,78
and pipelines doesn't help at all.

58
0:3:19,46 --> 0:3:21,46
Because locks are global, right?

59
0:3:21,46 --> 0:3:26,24
So you can, you don't lock a table
only in the context of some

60
0:3:26,24 --> 0:3:30,04
connection pool, you lock it globally
in this database, right,

61
0:3:30,04 --> 0:3:31,22
in this logical database.

62
0:3:31,72 --> 0:3:34,2
Michael: Yeah, so if it really
did mitigate the issue, either

63
0:3:34,2 --> 0:3:36,48
it was the other part of their
solution, which is the better

64
0:3:36,48 --> 0:3:41,58
cross-team communication, or the theory's
wrong, and actually it

65
0:3:41,58 --> 0:3:47,06
was a case of these migrations
taking up a few slots, a few connections

66
0:3:47,16 --> 0:3:50,86
that weren't, that the application
expected to be able

67
0:3:50,86 --> 0:3:51,42
Nikolay: to use.

68
0:3:51,42 --> 0:3:53,8
Migration needs only 1 connection,
right?

69
0:3:55,32 --> 0:3:56,18
Michael: Yeah, true.

70
0:3:56,82 --> 0:4:3,28
Nikolay: So how could pipelines,
data pipelines take a few connections?

71
0:4:3,84 --> 0:4:4,34
Yeah.

72
0:4:5,9 --> 0:4:10,6
I don't understand how they can
exhaust the reach max_connections

73
0:4:10,6 --> 0:4:11,18
at all.

74
0:4:11,74 --> 0:4:13,94
Michael: Well, should we talk about
the general case a bit more

75
0:4:13,94 --> 0:4:14,44
then?

76
0:4:15,04 --> 0:4:17,32
Nikolay: Yeah, so again, like this
general case is described

77
0:4:17,32 --> 0:4:18,84
and we discussed it a few times.

78
0:4:18,84 --> 0:4:23,28
Everyone should understand that
any ALTER statement, it requires

79
0:4:23,96 --> 0:4:26,46
a lock, exclusive lock on the table.

80
0:4:26,98 --> 0:4:30,04
Even, well, there are exclusions,
for example.

81
0:4:30,48 --> 0:4:31,66
No exclusions, actually.

82
0:4:31,82 --> 0:4:33,74
ALTER TABLE is a serious thing.

83
0:4:33,74 --> 0:4:36,82
Even if it's super fast, you still
need a lock.

84
0:4:37,36 --> 0:4:42,1
And if you, with default settings,
fail, like if your session

85
0:4:42,1 --> 0:4:45,22
fails to acquire this lock, it
will start waiting.

86
0:4:45,66 --> 0:4:48,34
It will wait forever with default
settings.

87
0:4:48,34 --> 0:4:52,28
Default settings means a log timeout
0, statement timeout 0,

88
0:4:52,28 --> 0:4:55,92
transaction timeout 0, idle transaction
session timeout 0.

89
0:4:55,92 --> 0:4:59,56
Everything 0 means forever, infinite,
right?

90
0:5:0,36 --> 0:5:1,96
Waiting infinitely.

91
0:5:2,78 --> 0:5:6,82
So this means it will be waiting,
waiting, waiting, but any query,

92
0:5:6,82 --> 0:5:11,38
including any SELECT to this table,
will need to wait behind

93
0:5:11,38 --> 0:5:12,54
us in the line.

94
0:5:12,72 --> 0:5:17,02
So there's a queue forming there,
waiting queue, lock queue,

95
0:5:17,02 --> 0:5:18,02
blocking queue.

96
0:5:18,96 --> 0:5:22,34
And actually, you might see multiple
queues, and it forms trees,

97
0:5:22,34 --> 0:5:25,36
and actually not just trees, it's
forests, so it's beautiful.

98
0:5:26,16 --> 0:5:30,32
Like, it can look really great,
like forest of blocking or locking

99
0:5:30,32 --> 0:5:35,92
trees, but this is eventually what
accumulates a lot of active

100
0:5:35,92 --> 0:5:39,66
sessions, can overwhelm your server
basically.

101
0:5:40,32 --> 0:5:44,44
Even if it's not achieving
max_connections, it can reach to high

102
0:5:44,44 --> 0:5:46,8
CPU consumption of course and so
on.

103
0:5:46,8 --> 0:5:52,12
But it's very common to see that
max_connections is reached because

104
0:5:52,12 --> 0:5:53,08
too many active sessions.

105
0:5:55,6 --> 0:5:57,62
And most of them are just waiting.

106
0:5:59,34 --> 0:6:3,64
So when you reach max_connections,
any additional attempt to

107
0:6:3,64 --> 0:6:7,04
connect will be, we'll see an error,
right?

108
0:6:7,08 --> 0:6:8,04
Too many connections.

109
0:6:8,64 --> 0:6:9,1
Yeah.

110
0:6:9,1 --> 0:6:11,32
Too many clients or how's it?

111
0:6:11,32 --> 0:6:12,26
I don't remember.

112
0:6:12,6 --> 0:6:13,98
Too many clients I think.

113
0:6:16,1 --> 0:6:16,3
Michael: Yeah.

114
0:6:16,3 --> 0:6:17,58
I can't remember either.

115
0:6:17,78 --> 0:6:22,0
I remember back in the day when
a lot of people used Heroku Postgres.

116
0:6:22,4 --> 0:6:26,08
This used to come up all the time
because they were much stricter

117
0:6:26,2 --> 0:6:28,76
on how many connections they would
let you have.

118
0:6:28,94 --> 0:6:33,08
And these days, more people on
things like RDS who have much

119
0:6:33,08 --> 0:6:37,24
much higher default max_connections
than Heroku ever has.

120
0:6:37,24 --> 0:6:39,72
I see it much much less that people
are hitting this.

121
0:6:39,72 --> 0:6:42,62
I think probably causing other
issues, I'm sure we'll get to

122
0:6:42,62 --> 0:6:46,86
that, but it's interesting that
that used to come up more in

123
0:6:46,92 --> 0:6:48,26
various online forums?

124
0:6:49,0 --> 0:6:53,14
Nikolay: Yeah, so in the past,
so we need to distinguish the

125
0:6:53,14 --> 0:6:55,58
states of sessions or connections.

126
0:6:55,58 --> 0:7:0,36
Connections and sessions or backends
all are synonyms in this

127
0:7:0,36 --> 0:7:2,06
context of Postgres.

128
0:7:2,32 --> 0:7:6,36
So we need to distinguish active,
idle, and idle in transaction,

129
0:7:6,5 --> 0:7:8,04
3 main states.

130
0:7:8,04 --> 0:7:12,22
And you can see states and selecting
it from pg_stat_activity.

131
0:7:14,54 --> 0:7:19,52
So before Postgres, I think 12,
13, 14, so

132
0:7:20,44 --> 0:7:21,42
Michael: 5 years ago.

133
0:7:21,68 --> 0:7:23,8
Yeah, is this the Andres Freund
work?

134
0:7:23,86 --> 0:7:24,64
Nikolay: Right, right.

135
0:7:24,64 --> 0:7:24,91602
Yeah.

136
0:7:24,91602 --> 0:7:29,1
A couple of articles published
originally in Citus blog, then

137
0:7:29,1 --> 0:7:35,44
in Microsoft blog, Microsoft Azure
Citus there, right?

138
0:7:35,66 --> 0:7:38,86
So a couple of good articles, excellent
articles.

139
0:7:39,06 --> 0:7:43,02
First researching and then discussing
optimization.

140
0:7:44,24 --> 0:7:50,2
So usual fear was that idle connections
consume memory, but articles

141
0:7:50,28 --> 0:7:53,86
prove that the main problem is
that how snapshot, work with snapshots

142
0:7:54,1 --> 0:7:58,04
is organized and idle connections,
for example, if you just add

143
0:7:58,04 --> 0:8:1,1
1000 idle connections, it can hurt
your performance.

144
0:8:1,12 --> 0:8:4,86
And I demonstrated it with benchmarks
easily as well.

145
0:8:5,78 --> 0:8:12,04
So it was quite easy to see but
after, so if you have I think

146
0:8:12,04 --> 0:8:17,3
14 plus, maybe 13 plus, If you
have this Postgres, basically

147
0:8:17,36 --> 0:8:18,62
everyone right now.

148
0:8:19,0 --> 0:8:22,4
Michael: Oh true, all supported
Postgres versions now, that's

149
0:8:22,4 --> 0:8:22,9
incredible.

150
0:8:23,2 --> 0:8:26,76
Nikolay: Because 13, we have a
couple of cases where 13 is still

151
0:8:26,76 --> 0:8:30,98
in production, and then talks to
upgrade it, but basically right

152
0:8:30,98 --> 0:8:35,02
now if you are on supported version,
so idle connections don't

153
0:8:35,02 --> 0:8:37,24
hurt performance anymore.

154
0:8:37,3 --> 0:8:37,74
Too much.

155
0:8:37,74 --> 0:8:39,12
Michael: Or anywhere near as much.

156
0:8:39,12 --> 0:8:39,78
Nikolay: Yeah, exactly.

157
0:8:39,78 --> 0:8:40,28
Yeah.

158
0:8:40,58 --> 0:8:45,8
So having 1000-2000 additional connections,
It's not a big deal

159
0:8:45,8 --> 0:8:46,42
these days.

160
0:8:46,48 --> 0:8:48,22
And also processors improved.

161
0:8:48,76 --> 0:8:50,46
I really like Graviton4.

162
0:8:50,9 --> 0:8:52,18
It's like, it's amazing.

163
0:8:52,9 --> 0:8:57,72
And it handles a lot of connections,
even active very well.

164
0:8:57,72 --> 0:9:4,0
Much like I have my neural network
trained what to expect from

165
0:9:4,0 --> 0:9:9,52
like 500 connections, for example
on a machine with fewer than

166
0:9:9,52 --> 0:9:11,26
hundred vCPUs, right?

167
0:9:11,68 --> 0:9:16,42
I can imagine what should happen,
but on new hardware, I see

168
0:9:16,42 --> 0:9:17,92
it much better.

169
0:9:17,92 --> 0:9:21,42
It handles more active sessions
much better.

170
0:9:21,58 --> 0:9:24,64
Michael: When you say your neural
network, do you mean your brain?

171
0:9:25,08 --> 0:9:29,5
Nikolay: Yeah, yeah, yeah, my LLM
internally, yeah.

172
0:9:29,5 --> 0:9:34,4
So I just, well I expected this,
but wow guys, you are fine actually,

173
0:9:34,4 --> 0:9:36,38
You are not down.

174
0:9:36,38 --> 0:9:38,8
I would expect you to be down by
this point, right?

175
0:9:38,8 --> 0:9:42,18
So we just see it in our monitoring
what people experience, what

176
0:9:42,18 --> 0:9:43,38
their cluster experiences.

177
0:9:43,98 --> 0:9:47,84
And like, wow, this is when they
check what hardware it is, oh

178
0:9:47,84 --> 0:9:48,54
it's actually

179
0:9:48,54 --> 0:9:51,38
Graviton4 on Amazon which is great.

180
0:9:53,0 --> 0:9:53,13
I'm impressed.

181
0:9:53,13 --> 0:9:53,26
And Postgres.

182
0:9:53,26 --> 0:9:53,45996
And

183
0:9:53,45996 --> 0:9:54,16
Michael: newer Postgres.

184
0:9:56,84 --> 0:9:57,76
Yeah, okay.

185
0:9:58,08 --> 0:10:1,4
But you're talking about lots and
lots of active sessions rather

186
0:10:1,4 --> 0:10:2,58
than idle.

187
0:10:3,16 --> 0:10:6,72
Nikolay: Additionally, on larger
machines, RDS and Aurora RDS,

188
0:10:6,74 --> 0:10:9,64
they just set really high max_connections.

189
0:10:9,8 --> 0:10:11,62
I think this decision was made.

190
0:10:11,68 --> 0:10:14,16
I'm really curious how this decision
was made.

191
0:10:14,68 --> 0:10:18,36
And It's like we see 2500, 5000...

192
0:10:21,9 --> 0:10:22,9
It's very common.

193
0:10:23,86 --> 0:10:27,7
I can already, like looking at
settings, I can tell you, oh,

194
0:10:27,7 --> 0:10:32,06
this is not seeing the settings
which start with RDS dot.

195
0:10:32,72 --> 0:10:36,06
Just looking at this profile of
settings, I already realize,

196
0:10:36,06 --> 0:10:39,06
oh, this is RDS, or this is like
CloudSQL.

197
0:10:39,52 --> 0:10:41,74
Like there are patterns of some
decisions.

198
0:10:42,26 --> 0:10:45,04
You know, random page cost 4 and
max_connections 5000.

199
0:10:45,06 --> 0:10:47,0
Oh, very likely it's RDS.

200
0:10:47,16 --> 0:10:48,14
Michael: RDS, yeah.

201
0:10:48,48 --> 0:10:49,3
That's funny.

202
0:10:49,3 --> 0:10:53,1
Nikolay: So this gives us opportunity
to observe how clusters

203
0:10:53,44 --> 0:10:56,5
behave with high max_connections.

204
0:10:57,7 --> 0:11:0,36
And old arguments don't work anymore.

205
0:11:0,48 --> 0:11:4,56
So old arguments were like, guys,
this is bad.

206
0:11:5,02 --> 0:11:11,82
You just have 2000 connections,
1900 of which are idle.

207
0:11:12,98 --> 0:11:14,04
Why do you do this?

208
0:11:14,04 --> 0:11:20,64
Because you just basically pay
big tax over performance-wise.

209
0:11:21,48 --> 0:11:25,28
Because all the latencies, it hurts
all the latencies, but now

210
0:11:25,28 --> 0:11:26,34
it's not so.

211
0:11:26,82 --> 0:11:33,42
And the only argument left in my
mind is that still if number

212
0:11:33,42 --> 0:11:37,32
of active sessions exceeds the
number of vCPUs significantly,

213
0:11:38,46 --> 0:11:44,24
then it hurts performance globally
and all queries are processed

214
0:11:44,24 --> 0:11:50,28
for much longer time, latencies
increase, and none of them have

215
0:11:50,28 --> 0:11:55,04
chances to be finished in time,
in statement timeout, which for

216
0:11:55,04 --> 0:11:56,18
example 30 seconds.

217
0:11:56,6 --> 0:12:1,56
So instead it would be better if
we reach max_connections sooner

218
0:12:1,56 --> 0:12:6,64
and just say some clients retry
later, like too many clients,

219
0:12:6,96 --> 0:12:14,4
but these guys would have much
higher chances to complete execution

220
0:12:14,58 --> 0:12:17,48
inside statement timeout.

221
0:12:19,2 --> 0:12:23,06
So we have limited resources, CPU
and disk I/O.

222
0:12:23,6 --> 0:12:24,4
Yeah, to make

223
0:12:24,4 --> 0:12:28,48
Michael: sure I understand, are
we talking about if you allow

224
0:12:28,48 --> 0:12:32,08
too many connections, even if most
of them are going to be idle

225
0:12:32,08 --> 0:12:35,64
most of the time, you run the risk
that a bunch of them will

226
0:12:35,64 --> 0:12:37,06
become active at the same time.

227
0:12:37,06 --> 0:12:37,64
Yeah, OK.

228
0:12:37,64 --> 0:12:38,1
Nikolay: Exactly.

229
0:12:38,1 --> 0:12:43,14
So if some wave of requests, unexpected
or expected, but poorly

230
0:12:43,14 --> 0:12:48,38
planned, happens, if we have low
max_connections, we will have

231
0:12:48,6 --> 0:12:52,44
a limited number of backends which
will do a job inside the statement

232
0:12:52,44 --> 0:12:54,78
timeout, if it's limited.

233
0:12:54,84 --> 0:12:57,24
If the statement timeout is limited,
it's a different story.

234
0:12:57,7 --> 0:13:0,36
It's also a very wrong state for
OLTP.

235
0:13:0,94 --> 0:13:2,7
We also discussed it many times.

236
0:13:3,1 --> 0:13:9,24
But if we have 10, 20, 50x compared
to the number of vCPUs, we

237
0:13:9,24 --> 0:13:12,04
actually have max_connections and
this wave comes.

238
0:13:12,04 --> 0:13:14,96
All of them try to execute, but
none of them will succeed because

239
0:13:14,96 --> 0:13:17,06
they will all bump into the statement
timeout.

240
0:13:17,14 --> 0:13:21,4
Or if the statement timeout is
not set, they will bump into some

241
0:13:21,4 --> 0:13:24,44
different timeout on HTTP level,
application level, or something

242
0:13:24,44 --> 0:13:25,58
like 1 minute or something.

243
0:13:25,58 --> 0:13:32,5
They will still, like nobody, like
we took all the chances from

244
0:13:32,5 --> 0:13:33,0
everyone.

245
0:13:33,5 --> 0:13:34,44
That's a problem.

246
0:13:34,54 --> 0:13:36,92
If we leave it, some of them will
succeed.

247
0:13:37,08 --> 0:13:40,04
Some of them will need to retry
because too many clients and

248
0:13:40,04 --> 0:13:41,6
we will need to fix the problem.

249
0:13:41,88 --> 0:13:45,84
And of course, if there is connection
pooler, it's great because

250
0:13:46,32 --> 0:13:51,98
connection pooler will take a lot
of headache from Postgres backends.

251
0:13:52,96 --> 0:13:55,52
Michael: Yeah well I wanted to
ask you about this actually because

252
0:13:55,68 --> 0:14:0,18
I think sometimes when 1 of us
says connection pool I think people,

253
0:14:0,72 --> 0:14:4,18
or people can assume either direction
right, They can assume

254
0:14:4,24 --> 0:14:6,84
database side or application side.

255
0:14:7,34 --> 0:14:11,6
And a lot of the time I see people
scaling their app and having

256
0:14:11,6 --> 0:14:13,2
application side pooling.

257
0:14:14,68 --> 0:14:18,34
And That's how I see, I think,
that's how a lot of the Heroku

258
0:14:18,34 --> 0:14:19,12
issues happened.

259
0:14:19,12 --> 0:14:19,3
Nikolay: A lot

260
0:14:19,3 --> 0:14:21,48
Michael: of the people running
out of connection on Heroku because

261
0:14:21,58 --> 0:14:25,92
it's so easy to spin up new versions
of your app and scale the

262
0:14:25,92 --> 0:14:26,7
app horizontally.

263
0:14:27,34 --> 0:14:30,52
But each 1 comes with another 10
connections.

264
0:14:30,7 --> 0:14:31,6
Another 10 connections.

265
0:14:32,72 --> 0:14:35,94
They're using poolers, but it's
application side, so there's

266
0:14:35,94 --> 0:14:37,80005
no, they don't use anything on
the database.

267
0:14:37,80005 --> 0:14:39,82
Nikolay: Those poolers are too
far from the database.

268
0:14:39,96 --> 0:14:42,94
And exactly this is this pattern
I saw so many times.

269
0:14:43,7 --> 0:14:46,3
Especially, imagine e-commerce.

270
0:14:47,3 --> 0:14:52,44
And they have microservices, many
clusters, some clusters, we

271
0:14:52,44 --> 0:14:53,16
see it.

272
0:14:53,68 --> 0:14:57,08
Okay, max_connections is high because
they don't use PgBouncer.

273
0:14:57,18 --> 0:15:0,06
They say, okay, we have Java application
and all the connection

274
0:15:0,06 --> 0:15:1,78
pooling is on our side.

275
0:15:1,96 --> 0:15:4,16
HikariCP for example, right?

276
0:15:4,16 --> 0:15:5,66
There's such connection pooling.

277
0:15:6,68 --> 0:15:9,74
And all great, I cannot convince
them to start using PgBouncer.

278
0:15:10,76 --> 0:15:14,76
Not bad, okay, max_connections
we keep somehow high because we

279
0:15:14,76 --> 0:15:16,0
have a lot of idle connections.

280
0:15:16,0 --> 0:15:16,5
Why?

281
0:15:16,68 --> 0:15:18,26
Because there are many application
nodes.

282
0:15:18,4 --> 0:15:22,42
And then before Black Friday, which
happened last week, this

283
0:15:22,42 --> 0:15:24,38
story was many, many years ago.

284
0:15:24,38 --> 0:15:28,68
Before Black Friday, guys, infrastructure
guys who are responsible

285
0:15:28,7 --> 0:15:32,92
for application nodes decided to
double capacity.

286
0:15:33,58 --> 0:15:37,12
This is in their hands, their decision,
okay, we need to be better

287
0:15:37,12 --> 0:15:39,8
prepared, let's just double number
of application nodes from

288
0:15:39,8 --> 0:15:41,92
100 to 200, for example, or something.

289
0:15:43,6 --> 0:15:47,64
And nobody thought about the configuration
of those poolers,

290
0:15:47,68 --> 0:15:50,58
So they were like 40, right?

291
0:15:50,58 --> 0:15:51,42
40 connections.

292
0:15:52,6 --> 0:15:57,74
So it was, for example, I don't
know, like 20 nodes times 40,

293
0:15:57,74 --> 0:15:59,44
it's 800, for example, right?

294
0:15:59,44 --> 0:16:1,26
Maximum number of connections,
800.

295
0:16:1,56 --> 0:16:3,58
But they doubled it, it became
1, 600.

296
0:16:4,74 --> 0:16:7,06
I think numbers were higher in
that case.

297
0:16:7,36 --> 0:16:11,14
And nobody talked to database team,
like, it's just application

298
0:16:11,2 --> 0:16:14,82
nodes, let's have, they're stateless,
let's add them.

299
0:16:15,06 --> 0:16:18,58
And this is how you suddenly see,
oh, many more idle connections.

300
0:16:18,86 --> 0:16:20,52
The actual load didn't increase.

301
0:16:21,58 --> 0:16:24,38
We don't need, like, a number of
active sessions didn't increase.

302
0:16:24,38 --> 0:16:28,08
But since capacity was doubled,
the number of idle connections

303
0:16:28,08 --> 0:16:28,92
doubled, obviously.

304
0:16:29,34 --> 0:16:32,44
And you cannot solve this problem
having connection pooler on

305
0:16:32,44 --> 0:16:33,92
application side at all?

306
0:16:34,02 --> 0:16:38,4
Well, you can, but you need to
somehow memorize every time you

307
0:16:38,4 --> 0:16:43,92
add or remove application node,
you need to rethink pool configuration

308
0:16:44,04 --> 0:16:47,86
for all of them, to maintain the
same overall number.

309
0:16:47,9 --> 0:16:51,48
Some of connections should be maintained,
right?

310
0:16:51,88 --> 0:16:54,56
Michael: So you mean if we double
the number of pools, we can

311
0:16:54,56 --> 0:16:57,22
halve how many connections each
pool has?

312
0:16:57,34 --> 0:16:57,84
Yes.

313
0:16:58,32 --> 0:17:1,78
Yeah, but then it seems brittle
to me to then need to remember

314
0:17:1,78 --> 0:17:2,28
that.

315
0:17:2,38 --> 0:17:3,42
Nikolay: Yeah, there are 2 numbers.

316
0:17:3,42 --> 0:17:5,42
There is like minimum and maximum
number.

317
0:17:5,86 --> 0:17:8,28
I'm talking mostly about like minimum
number.

318
0:17:8,44 --> 0:17:10,74
These connections are open and
they're maintained.

319
0:17:11,44 --> 0:17:15,2
It doesn't go down below that,
right?

320
0:17:15,54 --> 0:17:19,84
So if we doubled capacity, but
the actual workload didn't increase

321
0:17:19,84 --> 0:17:24,06
because it's not Black Friday yet,
why do we open more connections

322
0:17:24,06 --> 0:17:24,64
to Postgres?

323
0:17:24,64 --> 0:17:25,58
It's not right.

324
0:17:25,84 --> 0:17:31,1
And a good solution here is to
have an additional pooler closer

325
0:17:31,1 --> 0:17:35,52
to the database, where it will
be configured according to database

326
0:17:36,26 --> 0:17:40,04
situation, not situation with application
nodes and the needs

327
0:17:40,04 --> 0:17:41,68
to increase the fleet.

328
0:17:42,44 --> 0:17:47,94
Yeah, so connection pooler helps
and usually we say like, okay,

329
0:17:47,94 --> 0:17:52,68
if you have PgBouncer, each PgBouncer
can handle 10,000 connections

330
0:17:52,74 --> 0:17:56,08
incoming, something like this,
thousands of them.

331
0:17:56,5 --> 0:18:0,26
And it should open only 100 connections
to the database.

332
0:18:0,3 --> 0:18:2,14
This depends on the number of VCPUs.

333
0:18:2,16 --> 0:18:7,12
We still say take number of VCPUs
times some constant.

334
0:18:7,36 --> 0:18:9,52
Usually it was 2, 3, 4.

335
0:18:9,52 --> 0:18:10,82
Now we say 10.

336
0:18:11,38 --> 0:18:14,06
Because we see Postgres behavior
improved, hardware improved.

337
0:18:14,06 --> 0:18:14,76
Okay, 10.

338
0:18:15,06 --> 0:18:15,72
But not more.

339
0:18:15,72 --> 0:18:20,44
If you have only 96 connections,
OK, 1000.

340
0:18:21,42 --> 0:18:27,52
If you have 32 VCPs, I mean, number
of max_connections should

341
0:18:27,52 --> 0:18:29,58
be, OK, 320.

342
0:18:30,42 --> 0:18:31,66
It should not be 5000.

343
0:18:32,18 --> 0:18:35,28
I'm very curious how RDS team made
the decision.

344
0:18:36,0 --> 0:18:37,18
Because I guess they...

345
0:18:38,94 --> 0:18:40,64
I have several theories here.

346
0:18:40,64 --> 0:18:45,02
1 theory is there's so many clients
without pooling and this

347
0:18:45,02 --> 0:18:46,02
was just a need.

348
0:18:46,02 --> 0:18:47,94
And I'm like, okay, let's do it.

349
0:18:48,52 --> 0:18:50,32
Michael: It's just moving the problem,
isn't it?

350
0:18:50,32 --> 0:18:53,68
It's instead of people, instead
of people thinking, oh, what,

351
0:18:53,68 --> 0:18:55,6
I don't have enough connections,
they're going to hit

352
0:18:55,6 --> 0:18:56,68
Nikolay: performance issues.

353
0:18:57,88 --> 0:19:1,66
Yeah, maybe they have some additional
reasoning, which I don't

354
0:19:1,66 --> 0:19:2,16
get.

355
0:19:2,74 --> 0:19:3,72
I'm very curious.

356
0:19:3,94 --> 0:19:7,2
And another thing, maybe RDS proxy
was not so good, and still

357
0:19:7,2 --> 0:19:8,08
not so good.

358
0:19:8,3 --> 0:19:12,88
We saw it in action, RDS proxy,
it's a very weird proxy compared

359
0:19:12,88 --> 0:19:14,18
to PgBouncer behavior.

360
0:19:17,38 --> 0:19:19,14
You see idle connections more.

361
0:19:19,9 --> 0:19:22,54
I don't remember all the details,
but every time we touch it,

362
0:19:22,54 --> 0:19:24,88
I always say, like, why is it behaving
so?

363
0:19:25,52 --> 0:19:27,04
So what is this?

364
0:19:27,04 --> 0:19:27,94
It's very strange.

365
0:19:28,14 --> 0:19:29,12
It's very different.

366
0:19:29,2 --> 0:19:32,52
There's no, like, very good multiplexing
capabilities like in

367
0:19:32,52 --> 0:19:33,02
PgBouncer.

368
0:19:33,74 --> 0:19:38,68
You have 10,000 connections incoming,
maximum, and you transform

369
0:19:38,72 --> 0:19:41,08
them to maximum 100 backends.

370
0:19:41,94 --> 0:19:46,4
PgBouncer is very reliable and
stable to solve this task.

371
0:19:47,08 --> 0:19:48,74
RDS proxy, not so much.

372
0:19:48,74 --> 0:19:51,94
So maybe they had issues with RDS
proxy and they needed to raise

373
0:19:51,94 --> 0:19:56,78
max_connections because it was
not solving this task somehow.

374
0:19:58,78 --> 0:19:59,32
Michael: Interesting theory.

375
0:19:59,32 --> 0:20:0,54
If anybody knows more, let

376
0:20:0,54 --> 0:20:0,72
Nikolay: us know.

377
0:20:0,72 --> 0:20:2,34
Yeah, I'm very curious.

378
0:20:3,48 --> 0:20:6,8
So this is my approach right now
to max_connections as maximum

379
0:20:6,82 --> 0:20:13,78
10x for vCPUs, ideally lower so
database can breathe under heavy

380
0:20:13,78 --> 0:20:14,28
load.

381
0:20:14,28 --> 0:20:16,66
Michael: And lowers the risk, I
think, is that yeah.

382
0:20:16,94 --> 0:20:21,3
Nikolay: Yeah, So just to be honest
with you, how many resources

383
0:20:21,3 --> 0:20:22,08
do you have?

384
0:20:23,2 --> 0:20:24,72
Do you have 1000 cores?

385
0:20:24,72 --> 0:20:25,44
Maybe yes.

386
0:20:25,44 --> 0:20:26,88
In this case, no problem.

387
0:20:27,56 --> 0:20:28,68
Go for it.

388
0:20:28,68 --> 0:20:33,28
Because it sits all the time, like
no problem, no problem but

389
0:20:33,28 --> 0:20:36,72
then we have problem we have problem
and we have huge spike and

390
0:20:36,9 --> 0:20:41,16
unresponsive database right system
I mean database system

391
0:20:41,74 --> 0:20:45,2
Michael: so it's picking which
like what do you want your failure

392
0:20:45,2 --> 0:20:49,02
case to be right do you want your
failure case to be some people

393
0:20:49,02 --> 0:20:49,4838
can't access it or

394
0:20:49,4838 --> 0:20:49,7725
Nikolay: Do you want your failure
case to

395
0:20:49,7725 --> 0:20:50,98
Michael: be some people can't access
it or do you want your failure

396
0:20:50,98 --> 0:20:53,2
case to be nobody can do anything?

397
0:20:53,94 --> 0:20:58,44
Nikolay: Yeah and also if some
attempt to connect and execute

398
0:20:58,44 --> 0:21:3,78
a query from backend failed, backend
should have a retry logic.

399
0:21:5,38 --> 0:21:5,86
Michael: Yes, it

400
0:21:5,86 --> 0:21:6,22
Nikolay: should be.

401
0:21:6,22 --> 0:21:9,44
Connections and query execution, both.

402
0:21:9,76 --> 0:21:13,18
So it should be handled there properly additionally if users

403
0:21:13,18 --> 0:21:16,42
immediately see there's too many clients, Okay, maybe you should

404
0:21:16,42 --> 0:21:18,88
additionally adjust your application code.

405
0:21:19,0 --> 0:21:23,9
So, to have an ideal state is retrial logic with exponential

406
0:21:24,16 --> 0:21:25,26
backoff and jitter.

407
0:21:26,18 --> 0:21:29,16
So, this is like a more scientific approach.

408
0:21:29,16 --> 0:21:32,14
There is a good article, an Amazon article about this.

409
0:21:32,54 --> 0:21:34,34
It's not rocket science, but it's good.

410
0:21:34,34 --> 0:21:35,9299
There is some science behind it.

411
0:21:35,9299 --> 0:21:37,28
Exponential backoff and jitter.

412
0:21:37,72 --> 0:21:41,98
So it can happen quite fast, and this adds some resiliency to

413
0:21:42,5 --> 0:21:45,14
this system a little bit.

414
0:21:45,14 --> 0:21:49,04
Of course, if you have 500 max_connections and all the time all

415
0:21:49,04 --> 0:21:55,24
of them are busy, yeah, that will be, the users will notice this,

416
0:21:55,24 --> 0:21:55,74
right?

417
0:21:56,6 --> 0:22:0,78
But still those who are still working, they will have more chances

418
0:22:0,78 --> 0:22:1,48
to complete.

419
0:22:1,64 --> 0:22:3,54
And this is, I think, super important.

420
0:22:3,9 --> 0:22:8,22
Then sit with welcoming performance cliff, basically.

421
0:22:9,34 --> 0:22:9,84
Yeah.

422
0:22:10,44 --> 0:22:16,36
But another thing which I'm curious in is how to demonstrate

423
0:22:16,48 --> 0:22:17,12
the problem.

424
0:22:17,78 --> 0:22:20,82
Naive attempts failed so far, so I don't have a good article

425
0:22:20,82 --> 0:22:23,72
or demonstration how this performance cliff behaves.

426
0:22:24,88 --> 0:22:27,12
Name it performance cliff may be wrong as well.

427
0:22:27,12 --> 0:22:31,56
But I feel it's somehow like, not maybe cliff, But somehow too

428
0:22:31,56 --> 0:22:36,96
high max_connections increases risks to be down for everyone.

429
0:22:37,06 --> 0:22:39,64
And how to demonstrate it, I'm not sure.

430
0:22:41,14 --> 0:22:43,4
So these considerations are more theoretical.

431
0:22:43,74 --> 0:22:46,84
So we have many clients who listen to us, agreed.

432
0:22:46,84 --> 0:22:49,7
But max_connections, to change it, you need a restart.

433
0:22:50,08 --> 0:22:51,92
So they still see it with 5000.

434
0:22:53,64 --> 0:22:54,5
That's it.

435
0:22:54,86 --> 0:22:57,18
At the same time, I'm not super worried.

436
0:22:57,18 --> 0:23:0,98
Well, I still believe we need to go down with max_connections

437
0:23:1,0 --> 0:23:2,78
and plan a restart and do it.

438
0:23:3,16 --> 0:23:7,24
So unfortunately, some companies are afraid of restarts and try

439
0:23:7,24 --> 0:23:8,4
to minimize it.

440
0:23:8,42 --> 0:23:9,64
It's another question, right?

441
0:23:9,92 --> 0:23:14,06
How to stop fearing restarts and do minor upgrades more often

442
0:23:14,06 --> 0:23:14,84
and so on.

443
0:23:15,02 --> 0:23:19,34
But in our effect, OK, if you have 2 max_connections, it's also

444
0:23:19,34 --> 0:23:24,28
good because in our monitoring, we put active session history

445
0:23:24,28 --> 0:23:28,06
analysis, weight event analysis in the center right now.

446
0:23:28,2 --> 0:23:32,78
And it's quite great to see very colorful, beautiful graphs if

447
0:23:32,78 --> 0:23:36,1
we have a lot of max_connections a lot of areas with different

448
0:23:36,1 --> 0:23:42,1
colors it's so good it was a joke right so okay I mean if you

449
0:23:42,1 --> 0:23:45,48
if you have max_connections high your graphs looking better you

450
0:23:45,48 --> 0:23:45,98
know

451
0:23:46,02 --> 0:23:47,52
Michael: oh I see what you mean
yeah

452
0:23:47,52 --> 0:23:50,8
Nikolay: this probably is not responsive
right but graphs look

453
0:23:50,8 --> 0:23:51,58
really great

454
0:23:52,26 --> 0:23:56,98
Michael: okay well I'm interested
in the pool like I do think

455
0:23:56,98 --> 0:24:1,78
we do need to justify like why
should you have an extra, an extra

456
0:24:1,78 --> 0:24:6,64
thing running, like it's more complexity
for Pete to justify,

457
0:24:6,66 --> 0:24:7,16
right?

458
0:24:7,3 --> 0:24:10,88
It's another layer if we're saying
you should use PgBouncer.

459
0:24:11,76 --> 0:24:16,58
And added latency, like maybe not
much, but like there's a bit

460
0:24:16,58 --> 0:24:17,5
of extra, right?

461
0:24:17,5 --> 0:24:19,02
There's 1 extra hop.

462
0:24:21,04 --> 0:24:25,38
So I do think it is sensible to
like have to demonstrate what

463
0:24:25,38 --> 0:24:28,04
are, like what is the advantage.

464
0:24:28,62 --> 0:24:33,04
Nikolay: Yeah, yeah, well, It's
quite easy to demonstrate the

465
0:24:33,04 --> 0:24:35,88
benefit of having a PgBouncer in
between.

466
0:24:36,18 --> 0:24:39,24
There are certain types of workload
where it will be very noticeable

467
0:24:39,34 --> 0:24:41,0
that the PgBouncer improves.

468
0:24:41,0 --> 0:24:44,82
And also if you have slow clients,
it improves things, because

469
0:24:44,82 --> 0:24:50,16
backends are not working on transferring
data and like it's offloaded

470
0:24:50,2 --> 0:24:50,88
to a PgBouncer.

471
0:24:52,12 --> 0:24:54,04
Michael: Yeah, okay, that's a good
1.

472
0:24:54,4 --> 0:24:55,06
Nikolay: Yeah, yeah.

473
0:24:55,52 --> 0:25:5,06
But also if you take a lot of fast
selects and with PgBouncer,

474
0:25:5,28 --> 0:25:9,06
it will be much better than without
PgBouncer.

475
0:25:9,84 --> 0:25:14,22
Because, for example, you achieve
1 million TPS with select only

476
0:25:14,22 --> 0:25:15,26
pgbench, right?

477
0:25:16,1 --> 0:25:23,2
And PgBouncer helps Postgres to
communicate and start working

478
0:25:23,2 --> 0:25:24,56
on next execution.

479
0:25:25,96 --> 0:25:29,78
So you can feel it and you can
raise number of clients easier

480
0:25:31,16 --> 0:25:32,14
and scale better.

481
0:25:32,38 --> 0:25:34,26
This is a simple benchmark, actually.

482
0:25:35,54 --> 0:25:41,38
Usually, in this stress load testing,
we start with 1 client,

483
0:25:41,38 --> 0:25:46,88
2 clients, 4 clients, and so on,
a number of parameters, hyphens

484
0:25:46,96 --> 0:25:52,0
c and hyphen j, lowercase, in pgbench,
we match them because

485
0:25:52,0 --> 0:25:56,02
there is interesting logic not
everyone understands from beginning

486
0:25:56,04 --> 0:25:59,72
how these parameters are connected
to each other.

487
0:25:59,76 --> 0:26:0,92
And then we just grow.

488
0:26:1,0 --> 0:26:5,56
But usually when we grow to the
number of vCPUs, we go beyond

489
0:26:5,8 --> 0:26:9,44
that and TPS numbers go down, latency
goes up.

490
0:26:10,32 --> 0:26:15,08
Because, and this is ideal state
for direct connection benchmark.

491
0:26:15,74 --> 0:26:21,9
We just scale until the peak, which
matches number of VCPUs.

492
0:26:23,0 --> 0:26:26,16
If peak starts earlier, there is
some interesting bottleneck,

493
0:26:26,16 --> 0:26:26,82
I bet.

494
0:26:27,04 --> 0:26:28,58
Some lightweight lock or something.

495
0:26:30,12 --> 0:26:33,58
And then if you introduce PgBouncer, this peak should shift

496
0:26:33,82 --> 0:26:36,0
to the right and become higher.

497
0:26:37,76 --> 0:26:40,58
Yeah, that's how you can see it.

498
0:26:40,96 --> 0:26:43,14
Michael: Well, is that an easier
way to sell it then?

499
0:26:43,14 --> 0:26:48,28
Instead of like trying to convince
people to use it because it

500
0:26:48,28 --> 0:26:51,24
reduces the risk around exhaustion
of resources.

501
0:26:51,9199 --> 0:26:55,9
Actually just try and sell it on
the increased performance.

502
0:26:55,9 --> 0:26:56,28
There's no

503
0:26:56,28 --> 0:26:57,84
Nikolay: problem to sell PgBouncer.

504
0:26:58,52 --> 0:26:59,48
Everyone wants it.

505
0:26:59,54 --> 0:27:0,04
Almost everyone.

506
0:27:0,04 --> 0:27:1,88
Michael: Well, I don't, I'm interested.

507
0:27:1,88 --> 0:27:2,68
Nikolay: Yeah, Yeah.

508
0:27:2,68 --> 0:27:2,96
Yeah.

509
0:27:2,96 --> 0:27:4,78
Well, I talked about that company.

510
0:27:6,1 --> 0:27:7,42
There I couldn't sell it.

511
0:27:7,42 --> 0:27:11,2
These days I see everyone understands
quite easier that connection

512
0:27:11,2 --> 0:27:12,34
pooling is needed.

513
0:27:12,98 --> 0:27:18,22
And also keep in mind that I just
said a lot of very fast SELECTs.

514
0:27:18,96 --> 0:27:22,92
If you have long running statements,
quite long, for example,

515
0:27:23,8 --> 0:27:27,94
10 milliseconds or 100 milliseconds,
it's quite long.

516
0:27:28,04 --> 0:27:31,16
It means if 100 milliseconds, it
means execution 100 milliseconds,

517
0:27:31,16 --> 0:27:32,58
including planning time.

518
0:27:32,7 --> 0:27:38,08
It means that 1 backend can do
only 10 QPS, TPS.

519
0:27:38,74 --> 0:27:39,96
It's not a lot at all.

520
0:27:39,96 --> 0:27:41,7
And that's it.

521
0:27:41,88 --> 0:27:46,8
And communication overhead, which
PgBouncer will take on its

522
0:27:46,8 --> 0:27:49,46
shoulders, will not help a lot
in this case.

523
0:27:49,54 --> 0:27:54,16
It will help only if you have tons
of fast SELECTs.

524
0:27:54,16 --> 0:27:57,98
In this case, this overhead will
be comparable to execution time,

525
0:27:57,98 --> 0:27:58,14
right?

526
0:27:58,14 --> 0:27:59,94
Or maybe even higher somehow.

527
0:28:0,42 --> 0:28:3,06
Some millisecond SELECTs, this
is what we need.

528
0:28:3,26 --> 0:28:5,14
And guys can tell you, you know
what?

529
0:28:5,14 --> 0:28:6,68
We don't have so many.

530
0:28:7,28 --> 0:28:11,8
Depending on application, but sometimes
we see average Query

531
0:28:11,8 --> 0:28:14,88
latency is exceeding 1 millisecond
significantly.

532
0:28:15,1 --> 0:28:18,4
In this case, PgBouncer will have,
but not so much.

533
0:28:18,84 --> 0:28:22,44
Well, again, depending on the client,
sometimes clients are slow,

534
0:28:22,44 --> 0:28:25,14
sometimes they're under control
and also slow.

535
0:28:26,6 --> 0:28:31,34
But anyway, I just, from my experience
lately, dealing with a

536
0:28:31,34 --> 0:28:34,5
lot of startups, including AI startups,
which grow really fast.

537
0:28:34,84 --> 0:28:37,32
Everyone understands connection
pooling is needed.

538
0:28:37,54 --> 0:28:41,64
Usually it's a trade-off, like
if it's RDS, what to do, because

539
0:28:41,64 --> 0:28:47,26
RDS proxy is managed, no-brainer,
but they lack good behavior,

540
0:28:47,26 --> 0:28:51,72
as I said, and also they lack pause/resume,
which is weird because

541
0:28:52,12 --> 0:28:57,0
there's blue-green deployments,
but RDS proxy doesn't have pause/resume.

542
0:28:57,32 --> 0:29:2,54
It means that there is no 0 downtime
possibility here to maintain

543
0:29:2,54 --> 0:29:3,54
connections, right?

544
0:29:4,2 --> 0:29:8,5
And PgBouncer has it, but you
need to deal with it yourself.

545
0:29:9,28 --> 0:29:14,28
And if you go to global database
in Aurora, it's like it's, it's,

546
0:29:14,54 --> 0:29:18,98
it sticks to, it sticks you to
RDS proxy more.

547
0:29:19,06 --> 0:29:21,3
Michael: So what's their blue-green
deployments doing behind

548
0:29:21,3 --> 0:29:22,72
the scenes if it's not?

549
0:29:23,14 --> 0:29:25,58
Nikolay: I don't know about blue-green
deployments.

550
0:29:25,9 --> 0:29:28,02
They do logical replication and
everything.

551
0:29:28,18 --> 0:29:28,78
This is it.

552
0:29:28,78 --> 0:29:29,12
Yeah.

553
0:29:29,12 --> 0:29:34,96
But this switch if it's for RDS
proxy, RDS proxy doesn't have

554
0:29:34,96 --> 0:29:36,54
pause/resume, I know it, right?

555
0:29:36,82 --> 0:29:37,62
Michael: Yeah, interesting.

556
0:29:37,8 --> 0:29:41,14
Nikolay: Well, maybe, I haven't
checked a couple of months, maybe

557
0:29:41,38 --> 0:29:45,6
I have outdated, but I didn't see
the news about pause/resume.

558
0:29:45,9 --> 0:29:51,08
I think the switch is still, like,
interrupting current execution,

559
0:29:51,14 --> 0:29:53,82
so it's near 0 downtime, not fully
0 downtime.

560
0:29:54,52 --> 0:29:57,44
Michael: So when you were talking,
going back a bit, I'm still

561
0:29:57,44 --> 0:30:3,74
a bit confused, you were talking
about folks not being able to

562
0:30:3,74 --> 0:30:7,44
demonstrate a kind of a theoretical
limit of what's the added

563
0:30:7,44 --> 0:30:10,6
risk, why would you need to demonstrate
it?

564
0:30:10,6 --> 0:30:13,82
Like if you're seeing everybody
using a database side pool like

565
0:30:13,82 --> 0:30:17,58
PgBouncer already, who's it for
that demonstration?

566
0:30:18,9 --> 0:30:21,3
Nikolay: So let's distinguish 2
things here.

567
0:30:21,46 --> 0:30:24,94
First is how to demonstrate the
need in connection pooling.

568
0:30:25,46 --> 0:30:29,08
This is a slightly different topic
than reconfiguration of

569
0:30:29,08 --> 0:30:29,58
max_connections.

570
0:30:30,66 --> 0:30:33,14
Michael: Yeah, but very related,
no?

571
0:30:35,54 --> 0:30:42,44
Nikolay: Related, but I find it
easier to convince that connection

572
0:30:42,44 --> 0:30:46,6
pooling is needed compared to let's
reduce max_connections.

573
0:30:47,2 --> 0:30:50,16
Michael: Okay, so people are like,
we'll have a pooler, but we'll

574
0:30:50,16 --> 0:30:52,5
also still have 10,000 max_connections.

575
0:30:52,94 --> 0:30:53,44
Interesting.

576
0:30:53,72 --> 0:30:56,26
Nikolay: Well, it might be just
because it's painful to restart

577
0:30:56,26 --> 0:30:56,64
for them.

578
0:30:56,64 --> 0:30:57,76
That's it, honestly.

579
0:30:58,26 --> 0:30:58,76
Michael: Yeah.

580
0:30:59,28 --> 0:30:59,68
Yeah.

581
0:30:59,68 --> 0:31:2,52
And if most things are going through
the pooler and the pooler

582
0:31:2,52 --> 0:31:6,36
has a lower limit, then actually
they might not get near their

583
0:31:7,28 --> 0:31:8,08
max_connections.

584
0:31:8,94 --> 0:31:12,52
It's just a number, not actually
how many connections they have.

585
0:31:12,52 --> 0:31:14,8
Nikolay: Yeah, I'm checking blue/green
deployments once again,

586
0:31:14,8 --> 0:31:22,92
and I don't see pause/resume, and
I think switchover allows connections

587
0:31:22,92 --> 0:31:26,04
in both environments, allows write
operations on 1 cluster and

588
0:31:26,04 --> 0:31:27,94
only read only on another cluster.

589
0:31:28,5 --> 0:31:32,54
And then you basically, You need
to take care of everything yourself.

590
0:31:33,16 --> 0:31:35,4
This is undercooked, I think, situation.

591
0:31:36,5 --> 0:31:41,98
They should have pause/resume
in RDS proxy or just ship PgBouncer

592
0:31:42,1 --> 0:31:45,98
in a managed form as some other
managed platforms do, right?

593
0:31:46,1 --> 0:31:46,26
Yeah, true.

594
0:31:46,26 --> 0:31:47,06
That's it.

595
0:31:47,22 --> 0:31:52,1
In this case, it would be possible
to achieve fully 0 downtime.

596
0:31:52,58 --> 0:31:57,22
And we know like we our standard
criticism of blue green deployments,

597
0:31:57,56 --> 0:32:1,26
that is that it's not blue green
deployments because it's you

598
0:32:1,26 --> 0:32:5,64
lose 1 environment after switchover,
which is super strange.

599
0:32:5,74 --> 0:32:6,76
So it's not reversible.

600
0:32:7,2 --> 0:32:8,5
Michael: It's not reversible, yeah.

601
0:32:8,76 --> 0:32:11,42
I don't think you lose it, but
it's not reversible, yeah.

602
0:32:11,52 --> 0:32:16,86
Nikolay: Yeah, so again, I don't
find problems convincing about

603
0:32:16,86 --> 0:32:23,14
the need of pooler, but how to,
and again, it's easy to demonstrate

604
0:32:23,16 --> 0:32:27,32
that pooler brings benefits to
certain types of workload significantly,

605
0:32:27,9 --> 0:32:34,06
but how to demonstrate, and By
the way, we could emulate slow

606
0:32:34,06 --> 0:32:34,56
clients.

607
0:32:34,74 --> 0:32:39,96
In this case it will be even more
interesting to see how with

608
0:32:39,96 --> 0:32:44,06
PgBouncer Postgres behaves much
better, higher TPS, healthier

609
0:32:44,06 --> 0:32:47,06
state, lower active sessions and
so on.

610
0:32:47,36 --> 0:32:51,26
But when we move to the topic,
OK, max_connections, forget about

611
0:32:51,26 --> 0:32:51,76
pooler.

612
0:32:53,08 --> 0:32:57,94
How can I see that lower max_connections
is somehow beneficial?

613
0:32:59,18 --> 0:33:1,06
Here I have a hard time right now.

614
0:33:1,12 --> 0:33:4,7
If anyone has ideas let's work
together and create some benchmarks,

615
0:33:5,34 --> 0:33:8,54
demonstration tests, experiments.

616
0:33:10,58 --> 0:33:13,52
Michael: So maybe coming full circle,
the things that triggered

617
0:33:13,52 --> 0:33:19,48
this for you was the discussion
of it in the context of migrations

618
0:33:19,54 --> 0:33:22,88
and I mean specifically like schema
changes and data pipeline

619
0:33:23,26 --> 0:33:27,54
like long maybe long running transactions
maybe things that take

620
0:33:27,54 --> 0:33:28,4
heavy locks.

621
0:33:30,3 --> 0:33:32,32
Nikolay: So long running transactions
because if you acquire

622
0:33:32,32 --> 0:33:35,5
a lock it's released in the end
so it's a long running transaction.

623
0:33:36,74 --> 0:33:40,12
Michael: Yeah, I was thinking on
the read-only ones, at least

624
0:33:40,12 --> 0:33:42,76
there's, I mean there are still
massive implications, but it's

625
0:33:42,76 --> 0:33:46,78
not as devastating unless they're
combined, right, unless you

626
0:33:46,78 --> 0:33:47,52
have both.

627
0:33:48,22 --> 0:33:50,66
Nikolay: Read-only transactions
also acquire locks.

628
0:33:51,58 --> 0:33:52,7
Michael: Yes, yeah, of course.

629
0:33:54,08 --> 0:33:56,02
But they're fine until they block

630
0:33:56,54 --> 0:33:57,04
Nikolay: a...

631
0:33:58,26 --> 0:33:58,76
Modification.

632
0:33:59,54 --> 0:34:4,32
Exclusive lock coming from ALTER TABLE
cannot be acquired because

633
0:34:4,76 --> 0:34:8,42
of ongoing access share lock from some
ongoing SELECT, which probably

634
0:34:8,42 --> 0:34:13,76
isn't finished, but there is data
pipelines, they love complex

635
0:34:14,22 --> 0:34:18,7
long transactions and some brief
SELECT, which you executed in

636
0:34:18,7 --> 0:34:22,76
the beginning of transaction holds
this access share lock till the

637
0:34:22,76 --> 0:34:23,82
very end of transaction.

638
0:34:24,96 --> 0:34:27,6
And it blocks ALTER TABLE.

639
0:34:27,74 --> 0:34:31,58
It's just ALTER TABLE should have
low lock_timeout and re-tries.

640
0:34:32,62 --> 0:34:34,44
Michael: Yes, so that's the...

641
0:34:34,86 --> 0:34:35,64
Nikolay: That's the approach.

642
0:34:36,96 --> 0:34:38,68
This is good mitigation to...

643
0:34:40,9 --> 0:34:46,16
Not to properly schedule something,
because if not, the data

644
0:34:46,16 --> 0:34:47,86
pipeline can be something else.

645
0:34:47,9 --> 0:34:52,62
At any time, Postgres can start
aggressive autovacuum process,

646
0:34:53,08 --> 0:34:56,42
which will be freezing your tuples
to prevent transaction ID

647
0:34:56,42 --> 0:34:59,36
wraparound, and this will block
your DDL.

648
0:34:59,48 --> 0:35:4,36
If it hasn't happened yet, Either
it happened but you didn't

649
0:35:4,36 --> 0:35:7,68
notice yet, which is very common
I think.

650
0:35:10,68 --> 0:35:12,94
Some outage, we tried everything
good.

651
0:35:12,94 --> 0:35:14,28
Okay, Postgres is weird.

652
0:35:15,04 --> 0:35:17,98
Michael: Or even not even an outage,
maybe just some customers

653
0:35:17,98 --> 0:35:21,14
getting some slow, like things
running slowly.

654
0:35:21,5 --> 0:35:25,34
Nikolay: Yeah, like 5 seconds or
10 seconds latency spike and

655
0:35:25,34 --> 0:35:26,7
we don't understand why.

656
0:35:27,22 --> 0:35:32,06
Or you just got lucky And it awaits
you next week.

657
0:35:33,06 --> 0:35:36,86
So everyone must understand this
topic and implement low log

658
0:35:36,86 --> 0:35:41,54
timeout and retries until Postgres
will have work concurrently

659
0:35:41,8 --> 0:35:42,74
in all operations.

660
0:35:43,82 --> 0:35:47,92
Which will make things interesting
because concurrently means

661
0:35:48,06 --> 0:35:51,84
you lose the transactional behavior
of DDL.

662
0:35:52,44 --> 0:35:53,74
So it's very interesting.

663
0:35:54,12 --> 0:35:57,16
The future is interesting because
I think ALTER TABLE should

664
0:35:57,16 --> 0:36:1,02
have something like concurrently
or with 3 tries but it cannot

665
0:36:1,02 --> 0:36:3,42
be inside transaction block in
this case.

666
0:36:3,74 --> 0:36:6,3
So it's so interesting topic, right?

667
0:36:6,42 --> 0:36:10,3
Michael: Yeah, you always combine
these, lock timeout and retries,

668
0:36:10,44 --> 0:36:13,98
and it's only just struck me that
if that sounds like a lot of

669
0:36:13,98 --> 0:36:17,78
work to people, and then retries
can feel like a lot of work.

670
0:36:17,78 --> 0:36:21,1
Actually, the important part is
the lock timeout.

671
0:36:21,4 --> 0:36:22,2
Retries, it's

672
0:36:22,2 --> 0:36:22,36
Nikolay: like, whoa.

673
0:36:22,36 --> 0:36:23,5
That's a good point.

674
0:36:24,52 --> 0:36:29,24
Yesterday, we opened gates for
first clients for our copilot

675
0:36:29,38 --> 0:36:30,94
product, PostgreSQL Copilot.

676
0:36:31,16 --> 0:36:35,04
And we had a new client, which
actually originally they came

677
0:36:35,2 --> 0:36:41,34
as a consulting client but use
our product inside consulting

678
0:36:41,98 --> 0:36:46,22
and like we found a lot of issues
as usual, like It's great.

679
0:36:46,64 --> 0:36:51,34
And observing their situation,
I think it's RDS or Aurora, I

680
0:36:51,34 --> 0:36:54,52
don't remember particularly, but
I noticed lock timeout was very

681
0:36:54,52 --> 0:36:55,02
low.

682
0:36:55,44 --> 0:36:56,82
It was very, very low.

683
0:36:56,82 --> 0:36:58,48
It's so interesting, 100 milliseconds.

684
0:36:58,94 --> 0:37:0,28
I haven't talked to them yet.

685
0:37:0,28 --> 0:37:1,16
I will soon.

686
0:37:1,36 --> 0:37:4,94
But it's an interesting situation.

687
0:37:5,28 --> 0:37:6,98
Global lock timeout is very low.

688
0:37:8,1 --> 0:37:11,64
So I even started to think, okay,
lock timeout, does it affect

689
0:37:11,64 --> 0:37:14,48
only DDL or row level locks as
well?

690
0:37:15,36 --> 0:37:16,26
What do you think?

691
0:37:16,26 --> 0:37:17,0
Michael: Good question.

692
0:37:17,04 --> 0:37:17,94
I don't know.

693
0:37:18,08 --> 0:37:19,28
Nikolay: Yeah, it affects all.

694
0:37:19,28 --> 0:37:19,54
Yeah.

695
0:37:19,54 --> 0:37:20,74
Row level lock as well.

696
0:37:20,74 --> 0:37:25,66
So if you update a row but didn't
say commit, so it's like multistate

697
0:37:25,68 --> 0:37:26,3
on transaction.

698
0:37:26,38 --> 0:37:28,64
You updated the row, sit inside
transaction.

699
0:37:29,18 --> 0:37:32,06
Another transaction just fails
after 1 second.

700
0:37:32,9 --> 0:37:35,58
Oh, not after 1 second, after 100
milliseconds.

701
0:37:36,56 --> 0:37:39,86
And this makes me think, okay,
something is not right here.

702
0:37:40,16 --> 0:37:43,08
You know, deadlock timeout is 1
second by default.

703
0:37:43,44 --> 0:37:43,94
Yeah.

704
0:37:45,3 --> 0:37:47,34
But lock timeout is 100 milliseconds.

705
0:37:47,9 --> 0:37:50,78
So that lock detection never happens.

706
0:37:51,26 --> 0:37:52,44
Michael: Yeah, in that case.

707
0:37:52,54 --> 0:37:53,04
Nikolay: Yeah.

708
0:37:53,48 --> 0:37:54,98
How is it possible?

709
0:37:55,68 --> 0:38:0,92
I'm very curious how this application
feels, right?

710
0:38:1,02 --> 0:38:3,34
And I will be talking to them soon,
I hope.

711
0:38:3,34 --> 0:38:8,68
And I think maybe I'm just overlooking,
maybe there are some

712
0:38:8,68 --> 0:38:12,9
settings at session level, user
level.

713
0:38:13,46 --> 0:38:15,3
So this global maybe is overwritten.

714
0:38:15,72 --> 0:38:20,32
Because it's a very interesting
situation to keep it so low and

715
0:38:20,32 --> 0:38:25,12
global yeah but what you say basically
okay forget about read-write

716
0:38:25,12 --> 0:38:29,96
let's just set lock timeout second
or 2 seconds and maybe only

717
0:38:29,96 --> 0:38:31,12
for DDL right

718
0:38:31,8 --> 0:38:35,08
Michael: yeah that's all I'm thinking
is to start with start

719
0:38:35,08 --> 0:38:38,5
like add lock timeouts to your,
just to your migrations.

720
0:38:39,44 --> 0:38:42,34
And then if they start failing,
because they won't always like

721
0:38:42,34 --> 0:38:45,72
depending on your lock timeout
and depending on your like, well,

722
0:38:45,72 --> 0:38:49,33
Especially if you design them in
a way that is not super...

723
0:38:49,33 --> 0:38:52,08
You know, if you're not doing full
table reworks and things.

724
0:38:52,44 --> 0:38:55,08
Nikolay: You are right, but I can
tell you what happens next.

725
0:38:55,44 --> 0:39:0,84
Next pipeline runs for 10 minutes,
and then it fails.

726
0:39:0,84 --> 0:39:4,18
And like, fuck, damn, we need to
retry it whole.

727
0:39:4,74 --> 0:39:6,54
So retries will be long, big loops.

728
0:39:7,66 --> 0:39:9,8
Michael: Within the pipeline, and
within hours.

729
0:39:9,8 --> 0:39:10,64
Nikolay: Yeah, yeah, yeah.

730
0:39:10,64 --> 0:39:12,9
That's why I say do retries right
on.

731
0:39:12,9 --> 0:39:18,34
The most radical way, extreme way
to do retries is inside transactions,

732
0:39:18,48 --> 0:39:19,18
have sub-transactions.

733
0:39:19,4 --> 0:39:22,9
This is the only case when I can
say sub-transactions are reasonable.

734
0:39:23,76 --> 0:39:27,24
Well I think some financial applications
might need it as well,

735
0:39:27,24 --> 0:39:31,16
but in general case I tend to say
Avoid sub-transactions.

736
0:39:31,42 --> 0:39:36,58
But if you have a complex DDL transaction,
not DDL, it can be

737
0:39:36,58 --> 0:39:40,42
DDL, DML or something, and then
inside it you want to have retries,

738
0:39:41,06 --> 0:39:42,42
and you cannot lose...

739
0:39:42,9 --> 0:39:46,0
You can, but it's too costly, you
will need to retry a lot of

740
0:39:46,0 --> 0:39:46,5
stuff.

741
0:39:46,56 --> 0:39:49,92
You don't want to lose what you
have so far in that transaction.

742
0:39:49,92 --> 0:39:52,92
OK, you can use safe points there,
but you need to double check

743
0:39:52,92 --> 0:39:59,7
that safe points don't go too deep
in terms of nesting, 64, right?

744
0:40:0,04 --> 0:40:2,34
And also you need to check there
are no long-running transactions

745
0:40:3,52 --> 0:40:6,68
that will affect the health, and
you will see sub-transfers for

746
0:40:6,68 --> 0:40:7,58
your wait events.

747
0:40:7,8 --> 0:40:11,26
So in this case, you just retry,
and if retry fails, you retry.

748
0:40:11,68 --> 0:40:16,76
If you have a lot of timeout, if
your local DDL fails, you don't

749
0:40:16,76 --> 0:40:17,36
lose everything.

750
0:40:17,36 --> 0:40:21,0
You lose only until the latest
save point, and then retry again.

751
0:40:21,0 --> 0:40:25,24
Usually it's done using PL/pgSQL,
and begin, exception, when,

752
0:40:25,24 --> 0:40:27,3
blah, blah, and blocks.

753
0:40:29,08 --> 0:40:33,16
It will create sub-transactions
implicitly for you.

754
0:40:33,84 --> 0:40:36,9
And then you can retry there, and
this becomes an exception and

755
0:40:36,9 --> 0:40:37,4
block.

756
0:40:38,1 --> 0:40:39,28
And it's good, it's good.

757
0:40:39,28 --> 0:40:41,18
I mean, this is a good thing to
have.

758
0:40:41,28 --> 0:40:44,84
And in this case, retries are super
local, and the whole pipeline

759
0:40:45,04 --> 0:40:46,6
of deployment won't fail.

760
0:40:47,32 --> 0:40:49,52
If everything is all right.

761
0:40:49,62 --> 0:40:54,68
Of course, if there is a long transaction
running 2 hours, those

762
0:40:54,68 --> 0:40:56,68
retries probably won't last so
much.

763
0:40:57,1 --> 0:40:59,14
Michael: Yeah, it fails anyway.

764
0:40:59,24 --> 0:41:2,08
Nikolay: So that's why I connected
these 2 words.

765
0:41:2,08 --> 0:41:6,42
Michael: Oh, I definitely think
a lot of timeout and retries

766
0:41:6,42 --> 0:41:10,02
is great I'm just thinking as there's
there is an intermediate

767
0:41:10,2 --> 0:41:13,62
step if people want to get a lot
of the benefit without all of

768
0:41:13,62 --> 0:41:14,28
the work.

769
0:41:14,38 --> 0:41:16,94
Nikolay: Yeah well I agree at least
lock timeout yeah this is

770
0:41:16,94 --> 0:41:21,22
already it's a safeguard from from
downtime basically yeah at

771
0:41:21,22 --> 0:41:25,06
least partial downtime and here
interesting that we want like

772
0:41:25,06 --> 0:41:28,32
no like shift left testing you
know We want to test and find

773
0:41:28,32 --> 0:41:29,1
bugs earlier.

774
0:41:29,1 --> 0:41:32,98
Ideally, developer who developed
the code finds bugs immediately,

775
0:41:33,08 --> 0:41:33,4
right?

776
0:41:33,4 --> 0:41:35,14
This is shifted to left.

777
0:41:35,14 --> 0:41:37,32
Here it tries, we want to shift
to the right.

778
0:41:38,4 --> 0:41:42,62
Because if we shifted to like testing
pipelines, yeah, like it's

779
0:41:43,44 --> 0:41:47,36
too, it makes retry heavier, right?

780
0:41:47,7 --> 0:41:50,82
Michael: Yeah, these things are
also, I think, on the testing

781
0:41:50,82 --> 0:41:53,46
front, these are the hardest things
to test as well, right?

782
0:41:53,46 --> 0:41:56,82
Because they're dependent on other
activity.

783
0:41:56,82 --> 0:41:57,54
Nikolay: On situation.

784
0:41:58,52 --> 0:42:2,16
Michael: And very few people have
good test setups that have

785
0:42:2,16 --> 0:42:3,18
concurrent activity.

786
0:42:3,42 --> 0:42:5,18
You know, it's not common.

787
0:42:5,64 --> 0:42:7,18
Nikolay: Yeah, this is on 1 hand.

788
0:42:7,18 --> 0:42:10,64
On another hand, it's not rocket
science at all, and you just

789
0:42:10,64 --> 0:42:15,12
need to avoid 2 things.

790
0:42:15,66 --> 0:42:19,24
I thought about this, and you know,
like in DBLab, for example,

791
0:42:19,24 --> 0:42:24,98
in our tool for database branching,
we eventually developed methodology

792
0:42:25,24 --> 0:42:29,94
where we don't need background
workload because it's very uncertain.

793
0:42:30,78 --> 0:42:33,5
1 day it has long running transaction,
another day, we should

794
0:42:33,5 --> 0:42:34,74
be prepared for everything.

795
0:42:34,74 --> 0:42:35,78
This is our approach.

796
0:42:36,04 --> 0:42:38,42
And we just decided to do 2 things.

797
0:42:38,62 --> 0:42:41,46
1 thing is that we need lock timeout
and retries.

798
0:42:41,78 --> 0:42:42,82
This is 1 thing.

799
0:42:42,84 --> 0:42:45,4
Just to be prepared that lock cannot
be acquired.

800
0:42:45,64 --> 0:42:49,62
Again, lock cannot be acquired
sometimes because of autovacuum.

801
0:42:51,18 --> 0:42:53,64
Michael: Oh yeah, in anti-wraparound mode you mean?

802
0:42:53,68 --> 0:42:56,32
Nikolay: Yeah, and it can run, if it's a huge table, it can

803
0:42:56,32 --> 0:42:57,42
run hours sometimes.

804
0:42:57,88 --> 0:42:58,26
Michael: If it's

805
0:42:58,26 --> 0:43:0,0
Nikolay: throttled, it can run hours.

806
0:43:0,66 --> 0:43:6,22
And another thing is that we should not keep exclusive locks

807
0:43:6,22 --> 0:43:6,98
for long.

808
0:43:8,16 --> 0:43:8,76
That's it.

809
0:43:8,76 --> 0:43:10,78
For long means like for many seconds.

810
0:43:11,28 --> 0:43:12,48
They should be brief.

811
0:43:12,74 --> 0:43:20,38
If you know how to acquire locks gracefully, so not waiting for

812
0:43:20,38 --> 0:43:22,84
long, blocking others, and with retries.

813
0:43:22,84 --> 0:43:26,98
And also, if you acquired lock, you don't hold it too long, for

814
0:43:26,98 --> 0:43:28,3
long, that's it.

815
0:43:30,06 --> 0:43:31,12
There are exceptions, though.

816
0:43:31,12 --> 0:43:31,86
There are exceptions.

817
0:43:32,62 --> 0:43:36,18
For example, if you create a new table and you load data to it,

818
0:43:38,26 --> 0:43:41,2
well, technically nobody is working with this table yet.

819
0:43:41,2 --> 0:43:45,1
So if you created the table, you own the lock on it, exclusive

820
0:43:45,1 --> 0:43:45,6
lock.

821
0:43:45,62 --> 0:43:47,74
Inside the same transaction, you can load data.

822
0:43:48,16 --> 0:43:48,9
Why not?

823
0:43:49,4 --> 0:43:52,2
In this case you are breaking the rule number 2.

824
0:43:52,2 --> 0:43:56,54
You hold the lock too long, but it's harmless.

825
0:43:57,16 --> 0:44:0,42
So, but yeah, there are exclusions, but in general these 2 rules

826
0:44:0,92 --> 0:44:2,34
are serving great.

827
0:44:2,64 --> 0:44:5,28
Like they are helpful if you keep them in mind.

828
0:44:5,28 --> 0:44:9,26
Graceful acquisition and don't call it too long if it's very 

829
0:44:10,02 --> 0:44:13,1
heavy, like lock, heaviest lock, exclusive lock.

830
0:44:15,06 --> 0:44:15,7
Michael: Sounds good.

831
0:44:15,76 --> 0:44:18,24
Anything else you want to touch on before we call it a day?

832
0:44:18,24 --> 0:44:20,22
Nikolay: I think that's it, I hope it was helpful.

833
0:44:21,46 --> 0:44:22,58
Michael: Yep, nice.

834
0:44:22,64 --> 0:44:25,24
And I'm interested to hear if anyone gets in touch about any

835
0:44:25,24 --> 0:44:26,1
of this as well.

836
0:44:26,1 --> 0:44:27,32
Nikolay: Oh yeah, I'm curious.

837
0:44:27,84 --> 0:44:31,16
Maybe someone from RDS will comment on this.

838
0:44:31,32 --> 0:44:34,62
Why max_connections is so high by default?

839
0:44:35,38 --> 0:44:36,18
Michael: Yeah, it could be.

840
0:44:36,18 --> 0:44:36,5
Good.

841
0:44:36,5 --> 0:44:37,12
All right.

842
0:44:37,12 --> 0:44:37,84
Thanks Nikolay.

843
0:44:37,84 --> 0:44:38,54
Take care.

844
0:44:38,56 --> 0:44:39,16
You too.