1
00:00:00,300 --> 00:00:02,960
Michael: Hello and welcome to Postgres.FM, a weekly show about

2
00:00:02,960 --> 00:00:03,960
all things PostgreSQL.

3
00:00:03,960 --> 00:00:06,580
I am Michael, founder of pgMustard, and as usual, I'm joined

4
00:00:06,580 --> 00:00:08,560
by my co-host, Nikolay, founder of Postgres.AI.

5
00:00:08,560 --> 00:00:09,300
Hey, Nikolay.

6
00:00:09,340 --> 00:00:10,020
Nikolay: Hi, Michael.

7
00:00:10,400 --> 00:00:10,900
Michael: Hello.

8
00:00:11,140 --> 00:00:15,040
Today, we are excited to have a special guest, Tomas Vondra,

9
00:00:15,040 --> 00:00:18,960
who is a Postgres major contributor and committer now working

10
00:00:18,960 --> 00:00:20,860
at Microsoft on their Postgres team.

11
00:00:20,860 --> 00:00:22,260
Welcome to the show, Tomas.

12
00:00:23,240 --> 00:00:24,100
Tomas: Hello, everyone.

13
00:00:24,800 --> 00:00:28,440
Michael: Right, so a few months ago, you gave an excellent talk

14
00:00:28,440 --> 00:00:32,260
about where performance cliffs come from, and we have been excited

15
00:00:32,260 --> 00:00:33,580
to talk to you about this.

16
00:00:33,780 --> 00:00:36,560
So perhaps as a starting point could you let us know a little

17
00:00:36,560 --> 00:00:39,280
bit like how you got interested in this as a topic or why you

18
00:00:39,280 --> 00:00:41,260
chose to talk about it there.

19
00:00:41,900 --> 00:00:45,400
Tomas: So I think I started thinking about performance glyphs

20
00:00:45,560 --> 00:00:48,700
when I actually started working on Postgres or started using

21
00:00:48,700 --> 00:00:49,200
Postgres.

22
00:00:49,540 --> 00:00:53,940
Because performance lists are something people often run into

23
00:00:54,280 --> 00:00:59,000
and have to investigate changes, sudden changes in behavior after

24
00:00:59,820 --> 00:01:01,960
small changes in the query.

25
00:01:02,800 --> 00:01:08,920
A single additional row in a query could kind of like flip the

26
00:01:08,920 --> 00:01:13,580
behavior in a way that is completely like unexpected and unpredictable

27
00:01:14,540 --> 00:01:17,780
and it's causing problems for like robustness and so on.

28
00:01:18,580 --> 00:01:23,040
And I actually started with Postgres as a user or a developer

29
00:01:23,100 --> 00:01:24,020
using Postgres.

30
00:01:24,720 --> 00:01:29,640
And 1 of the things I was responsible for was investigating performance

31
00:01:29,640 --> 00:01:30,140
problems.

32
00:01:30,560 --> 00:01:35,460
So that's how I actually came to work on performance cliffs.

33
00:01:36,000 --> 00:01:40,640
But after I started working on development of Postgres, you also

34
00:01:40,640 --> 00:01:45,140
see the other side of performance cliffs, which is like why it

35
00:01:45,140 --> 00:01:48,280
actually happens in practice, because the database needs to do

36
00:01:48,280 --> 00:01:48,980
some decisions.

37
00:01:49,860 --> 00:01:53,260
So it's also an engineering problem which is interesting for

38
00:01:53,260 --> 00:01:55,220
me as a developer committer.

39
00:01:55,720 --> 00:02:00,460
And I've been working on a planner some of the time and That's

40
00:02:00,460 --> 00:02:03,660
where some of the performance was actually happen, right?

41
00:02:03,900 --> 00:02:08,440
Because of decisions made either during planning or then during

42
00:02:08,440 --> 00:02:09,180
the execution.

43
00:02:09,840 --> 00:02:14,440
So I've been reading some papers and that's actually where I

44
00:02:14,440 --> 00:02:17,040
learned that there even isn't like a name for this, which is

45
00:02:17,040 --> 00:02:18,180
the performance cliff.

46
00:02:18,600 --> 00:02:27,540
And there is a whole area of research for robustness, like making

47
00:02:27,540 --> 00:02:32,020
robust decisions where the performance cliffs do not happen at

48
00:02:32,020 --> 00:02:34,840
all or not as often.

49
00:02:34,960 --> 00:02:37,460
So that's why I'm interested in this.

50
00:02:38,940 --> 00:02:41,920
Michael: Nice, and yeah, I really like the term performance cliff.

51
00:02:41,920 --> 00:02:45,480
It's so intuitive in that, from
the example you were talking

52
00:02:45,480 --> 00:02:47,200
about kind of adding 1 additional
row.

53
00:02:47,200 --> 00:02:51,140
So if we go back a few steps, if
we're adding 1 row incrementally

54
00:02:51,360 --> 00:02:54,720
up until the cliff, we might get
a certain gradient, might even

55
00:02:54,720 --> 00:02:59,700
be relatively horizontal and then
all of a sudden there's a massive

56
00:02:59,700 --> 00:03:01,080
spike and that's like the cliff.

57
00:03:01,080 --> 00:03:03,920
I'm probably thinking of it as
a spike but it maybe a cliff would

58
00:03:03,920 --> 00:03:04,780
be a drop.

59
00:03:05,460 --> 00:03:07,700
So yeah I really like that as a
phrase.

60
00:03:08,100 --> 00:03:13,180
You mentioned kind of planner versus
execution time cliffs and

61
00:03:13,180 --> 00:03:16,080
I guess we're talking about individual
execution more than kind

62
00:03:16,080 --> 00:03:19,140
of system level cliffs, mostly
here.

63
00:03:19,740 --> 00:03:23,080
From your experience as a user,
or maybe since you've, since

64
00:03:23,080 --> 00:03:25,580
you've been looking at like trying
to prioritize which things

65
00:03:25,580 --> 00:03:30,900
to work on, where do you see the
majority of cliffs coming from?

66
00:03:30,900 --> 00:03:34,120
Like plan time, execution time,
some, some mix.

67
00:03:35,660 --> 00:03:37,040
Tomas: So that's a good question.

68
00:03:37,040 --> 00:03:41,900
I don't know because I'm inherently
biased, right?

69
00:03:43,580 --> 00:03:46,740
The fact that a lot of performance
groups, for example, happen

70
00:03:46,740 --> 00:03:50,940
during planning means that you
can't actually see a lot of cliffs

71
00:03:50,940 --> 00:03:52,460
that happen later, right?

72
00:03:52,640 --> 00:04:00,980
So I don't have a very good like
data to explain that or to say

73
00:04:01,120 --> 00:04:05,000
that most of the cliffs happen
at some point.

74
00:04:05,340 --> 00:04:11,140
But I would say that the sooner
the decision needs to be done

75
00:04:11,760 --> 00:04:16,120
in the query planning and query
execution, The sooner you need

76
00:04:16,120 --> 00:04:20,660
to do that, the less information
you actually have to do a good,

77
00:04:20,660 --> 00:04:21,640
robust decision.

78
00:04:22,280 --> 00:04:25,840
And the more likely it is that
it will be wrong, right?

79
00:04:25,840 --> 00:04:29,260
Because for example, we do planning
based on statistics, and

80
00:04:29,260 --> 00:04:32,640
those are inherently inaccurate,
right?

81
00:04:33,540 --> 00:04:37,940
There will be a lot of details
that we remove from the statistics

82
00:04:38,000 --> 00:04:43,320
to keep it small, and therefore,
those are the things that can

83
00:04:43,320 --> 00:04:44,440
actually cause problems.

84
00:04:44,640 --> 00:04:49,120
And then there is, of course, the
thing that even the cost model

85
00:04:49,120 --> 00:04:52,380
itself is like extremely simplified,
right?

86
00:04:52,700 --> 00:04:59,480
So I would say that the main problems
is that we are making a

87
00:04:59,480 --> 00:05:02,780
lot of decisions early in the query
execution.

88
00:05:03,620 --> 00:05:09,820
And because of that, we are kind
of like fixed in a certain execution

89
00:05:09,880 --> 00:05:13,440
path, and that may not be the optimal
1, right?

90
00:05:13,440 --> 00:05:17,720
So that's why I think the performance
glitches come from kind

91
00:05:17,720 --> 00:05:18,220
of.

92
00:05:19,220 --> 00:05:22,960
Nikolay: I wanted to ask, there
should be a challenge not only

93
00:05:22,960 --> 00:05:27,500
about like planner versus executor,
but also their, both of them

94
00:05:27,500 --> 00:05:32,220
behavior, like in a single session
when there is no like a multi-user

95
00:05:33,160 --> 00:05:40,240
situation versus some various contention
situations when we have

96
00:05:40,240 --> 00:05:44,060
many, many sessions working in
parallel and all of them have

97
00:05:44,060 --> 00:05:45,140
some issue.

98
00:05:45,140 --> 00:05:50,820
For example, unfamous lightweight
local log manager happens when

99
00:05:50,820 --> 00:05:55,800
during planning planner needs to
log all the indexes for the

100
00:05:55,800 --> 00:05:57,380
table and we exceed 16.

101
00:05:58,780 --> 00:06:02,420
Fast-path becomes not available
for some relations.

102
00:06:02,900 --> 00:06:07,060
And we can not see it when we just
checking the single query,

103
00:06:07,720 --> 00:06:11,720
but we, like it becomes a performance
cliff immediately when

104
00:06:11,720 --> 00:06:15,200
we have thousands of transactions
per second doing the same thing

105
00:06:15,340 --> 00:06:17,060
and planning becomes problem, right?

106
00:06:17,540 --> 00:06:20,860
I'm curious, how do you think this
should be understood, like

107
00:06:20,860 --> 00:06:26,380
with what tools and earlier, not
when you already fell from that

108
00:06:26,380 --> 00:06:27,980
cliff, right?

109
00:06:27,980 --> 00:06:34,160
But before, this is the trickiest,
like how to diagnose cliffs

110
00:06:34,160 --> 00:06:38,040
which are going to happen soon
with our system.

111
00:06:40,240 --> 00:06:46,500
Tomas: So, I understand the example
that you've been describing,

112
00:06:46,920 --> 00:06:52,060
but I think that's a slightly different
performance cliff and

113
00:06:52,060 --> 00:06:56,000
not what I usually mean by that
term, because you are kind of

114
00:06:56,000 --> 00:06:59,480
like talking about like resource
exhaustion, right?

115
00:07:00,160 --> 00:07:04,960
You have a resource, which is like
a lock queue or like the buffer

116
00:07:04,960 --> 00:07:10,420
for the fast-path locks, which was
increased or like made larger

117
00:07:10,440 --> 00:07:12,040
in Postgres 18.

118
00:07:12,040 --> 00:07:12,920
Nikolay: So we should have

119
00:07:12,920 --> 00:07:13,080
Tomas: the problem.

120
00:07:13,080 --> 00:07:14,020
Nikolay: And it's configurable
now, right?

121
00:07:14,020 --> 00:07:15,140
I think it's maybe configurable.

122
00:07:15,140 --> 00:07:15,640
Yes.

123
00:07:17,380 --> 00:07:21,760
Tomas: It's ties to expected number
of transactions per connection,

124
00:07:22,160 --> 00:07:26,680
so it is configurable in this way,
yes.

125
00:07:27,380 --> 00:07:29,940
So it's not fixed size like before.

126
00:07:30,940 --> 00:07:35,420
But I think what I mean by performance
cliffs and what I describe

127
00:07:35,440 --> 00:07:42,320
in my talk is mostly performance
cliffs that are done because

128
00:07:42,340 --> 00:07:46,940
of like a binary decision somewhere
in the processing, right?

129
00:07:47,160 --> 00:07:50,860
And the locking bottleneck is not
that, right?

130
00:07:50,860 --> 00:07:56,660
I mean, like, yes, you do have
the case where you can either

131
00:07:56,660 --> 00:07:58,940
fit the fast path lock or not,
right?

132
00:07:58,980 --> 00:08:01,940
And if not, you have to do something
much more expensive.

133
00:08:02,360 --> 00:08:09,220
That's a problem, but it's not
the thing that I had in mind thinking

134
00:08:09,220 --> 00:08:10,020
about performance.

135
00:08:10,640 --> 00:08:14,720
Nikolay: Yeah, now I remember when
I was checking your talk,

136
00:08:14,720 --> 00:08:18,160
this is exactly what I thought,
like we have different views

137
00:08:18,160 --> 00:08:19,020
on this term.

138
00:08:19,340 --> 00:08:23,120
And I grabbed this term from various
blog posts where people

139
00:08:23,120 --> 00:08:29,340
talked about multixact, SLRU
buffer problem, like it was also

140
00:08:29,340 --> 00:08:29,940
a cliff.

141
00:08:30,040 --> 00:08:35,080
Sub-transactions, also calling
this performance cliff, like saying

142
00:08:35,080 --> 00:08:38,360
sub-transactions are cursed, don't
use them at all, because there

143
00:08:38,360 --> 00:08:40,600
is a cliff that is hidden there
and so on.

144
00:08:40,600 --> 00:08:43,220
So this definition is very different
from yours.

145
00:08:43,260 --> 00:08:45,140
I understand, yeah, yeah, I agree
with you.

146
00:08:45,140 --> 00:08:49,740
Tomas: Yeah, I don't feel like
I'm the authority to define what

147
00:08:49,740 --> 00:08:50,900
exactly that means.

148
00:08:51,220 --> 00:08:55,240
I'm trying to more explain what
I've been describing, like the

149
00:08:55,240 --> 00:08:57,260
cases that I've been discussing
in my talk.

150
00:08:57,720 --> 00:09:03,080
Because also the reasons why that
happens are slightly different,

151
00:09:03,080 --> 00:09:03,580
right?

152
00:09:03,900 --> 00:09:07,580
For the performance, as I understand them, or as I discuss them,

153
00:09:07,840 --> 00:09:12,180
it's more like an engineering question because you have multiple

154
00:09:12,180 --> 00:09:14,400
algorithms that you can choose from.

155
00:09:14,640 --> 00:09:19,020
Each of those algorithms is kind of optimal, best suited for

156
00:09:19,020 --> 00:09:25,380
certain type of conditions or selectivity ranges or something

157
00:09:25,380 --> 00:09:26,100
like that.

158
00:09:26,720 --> 00:09:29,780
And therefore, if you make the decision based on the assumption

159
00:09:30,480 --> 00:09:32,960
that you know the correct answer, right?

160
00:09:34,080 --> 00:09:38,520
And I think the bottlenecks that you've been describing are more

161
00:09:38,520 --> 00:09:42,160
about like say changes in the hardware available.

162
00:09:43,360 --> 00:09:49,220
Because years ago when we initially sized the number of

163
00:09:49,220 --> 00:09:53,540
fast-path locks, I think 16 was probably fine.

164
00:09:54,800 --> 00:09:55,780
It was reasonable.

165
00:09:57,700 --> 00:10:03,240
You probably didn't have that many tables or connections, and

166
00:10:03,420 --> 00:10:04,840
it wasn't such a problem.

167
00:10:05,380 --> 00:10:10,700
But nowadays we have many more CPUs, many more cores available

168
00:10:11,120 --> 00:10:11,620
usually.

169
00:10:12,560 --> 00:10:16,260
And we also have things like partitioning, which can completely

170
00:10:16,260 --> 00:10:20,100
explode the number of things that you need to think.

171
00:10:20,380 --> 00:10:24,140
So I think it's more like a consequence of changes in Postgres

172
00:10:24,620 --> 00:10:27,680
that leads to hitting this bottleneck.

173
00:10:28,440 --> 00:10:30,540
I'm not saying it's not a valid problem.

174
00:10:30,720 --> 00:10:31,680
It definitely is.

175
00:10:31,680 --> 00:10:33,140
We need to improve that.

176
00:10:33,940 --> 00:10:36,240
But it's a slightly different thing.

177
00:10:36,660 --> 00:10:38,860
Nikolay: Right, yeah, I agree, yeah.

178
00:10:39,920 --> 00:10:43,260
Michael: The topic's already huge if we only limit it to single

179
00:10:43,260 --> 00:10:46,640
session, single query performance cliffs.

180
00:10:46,640 --> 00:10:51,420
So if we think about that, just those, you talked about both

181
00:10:51,420 --> 00:10:55,520
reducing kind of the frequency of these, so making them less

182
00:10:55,520 --> 00:10:59,240
likely to happen, but also reducing the severity of them.

183
00:10:59,240 --> 00:11:03,340
So it kind of accepting they will happen, make them less of a

184
00:11:03,340 --> 00:11:05,860
cliff or less of a drop when they do.

185
00:11:06,340 --> 00:11:09,760
Can you talk us through some things from a user perspective that

186
00:11:09,760 --> 00:11:15,800
we can do to reduce the severity and or to reduce the likelihood

187
00:11:15,800 --> 00:11:17,120
of running into them?

188
00:11:17,860 --> 00:11:19,100
Tomas: That's a good question.

189
00:11:20,500 --> 00:11:26,680
I think it probably depends on each individual case of the performance

190
00:11:26,740 --> 00:11:27,240
cliff.

191
00:11:27,440 --> 00:11:30,040
And in some cases you probably can't do anything.

192
00:11:31,720 --> 00:11:35,940
But it's just good to know that this is what's happening.

193
00:11:36,280 --> 00:11:42,100
In the talk, which you can see on YouTube, and I'm actually invited

194
00:11:42,100 --> 00:11:48,500
to do the talk at the San Francisco meetup in a week.

195
00:11:49,160 --> 00:11:55,740
But the first example that I gave in that talk is about the number

196
00:11:56,200 --> 00:11:59,880
of elements in the in condition, right?

197
00:12:00,300 --> 00:12:03,140
Where column in and then array of elements.

198
00:12:03,820 --> 00:12:07,940
And there is like a flip because we exactly switch from 1 algorithm

199
00:12:07,940 --> 00:12:10,820
to the other, like from linear search to hash.

200
00:12:12,180 --> 00:12:14,000
But that's hard-coded, right?

201
00:12:15,040 --> 00:12:17,220
There's nothing the User can do
about that.

202
00:12:17,880 --> 00:12:25,940
The one thing you could do is to
pick the right size of the element,

203
00:12:26,320 --> 00:12:27,540
of the list, right?

204
00:12:28,260 --> 00:12:31,300
But that's not a very practical
thing.

205
00:12:32,660 --> 00:12:36,480
But I think in many cases just
knowing what's causing the problem

206
00:12:37,660 --> 00:12:41,280
can be a very useful thing because
you can say like okay, this

207
00:12:41,280 --> 00:12:46,100
is what's happening, I can't do
anything about that, I have quantified

208
00:12:47,640 --> 00:12:53,180
the impact, and we have just to
live with that for now.

209
00:12:54,120 --> 00:12:59,700
I can imagine patches doing something,
like improving that by

210
00:13:00,660 --> 00:13:06,240
adjusting the point where exactly
that flips, by doing some better

211
00:13:06,400 --> 00:13:06,900
estimation.

212
00:13:07,640 --> 00:13:10,740
But at this point, people can't
do anything about that.

213
00:13:11,520 --> 00:13:16,660
For some of the cases in planning,
you can do the usual thing,

214
00:13:16,840 --> 00:13:21,020
like trying to convince the planner
to do the right decision,

215
00:13:21,020 --> 00:13:21,520
right?

216
00:13:22,040 --> 00:13:24,400
And there are different ways to
do that.

217
00:13:24,400 --> 00:13:27,460
I don't know what's the best solution
to that.

218
00:13:28,680 --> 00:13:32,720
Michael: Things I really liked
were, I think proactively people

219
00:13:32,720 --> 00:13:36,560
can adjust some of the cost parameters
like we still have random

220
00:13:36,560 --> 00:13:40,400
page cost as for as the default
even on most cloud providers,

221
00:13:40,580 --> 00:13:45,060
so I feel like things like that
when we've mostly got SSDs these

222
00:13:45,060 --> 00:13:50,020
days just feel like they're going
to be more likely to push plan

223
00:13:50,020 --> 00:13:51,420
flips at the wrong times.

224
00:13:52,000 --> 00:13:54,480
Some of these, if there are other
cost signals that are like

225
00:13:54,480 --> 00:13:58,820
massively out as well, I'm not
aware of any, but that feels like

226
00:13:58,820 --> 00:14:02,420
a foot gun or something that if
people change proactively or

227
00:14:02,680 --> 00:14:05,720
reduce proactively, they'd be less
likely to encounter this kind

228
00:14:05,720 --> 00:14:06,220
of thing.

229
00:14:06,220 --> 00:14:09,780
I'm also aware you did a lot of
the work on extended statistics

230
00:14:09,920 --> 00:14:14,120
and that feels like another tool
that, used well, could start

231
00:14:14,120 --> 00:14:19,480
to teach the planner about certain
relationships and make cardinality

232
00:14:20,080 --> 00:14:22,320
errors far less likely.

233
00:14:22,660 --> 00:14:26,020
And again, maybe then less likely
to have plan flips at the worst,

234
00:14:26,040 --> 00:14:27,220
not the worst times.

235
00:14:27,340 --> 00:14:29,960
When we do have the plan flip,
it's more likely to be closer

236
00:14:29,960 --> 00:14:31,420
to the optimal point.

237
00:14:31,420 --> 00:14:34,060
So we're less likely to have a
severe cliff at least.

238
00:14:34,300 --> 00:14:35,140
What do you think?

239
00:14:35,820 --> 00:14:39,120
Tomas: I think you are absolutely
right that like tuning some

240
00:14:39,120 --> 00:14:42,840
of the parameters for like random
page costs, for example, and

241
00:14:42,840 --> 00:14:48,660
something like that is exactly
one of the things you can do to

242
00:14:49,400 --> 00:14:53,360
convince the planner to make the
correct decisions more often,

243
00:14:53,400 --> 00:14:53,900
right?

244
00:14:54,340 --> 00:14:58,980
Because you are essentially moving
the flip point between the

245
00:14:58,980 --> 00:15:05,340
costs closer to the actual flip
point, right, between the execution

246
00:15:05,640 --> 00:15:08,160
costs, like measured in time, for
example.

247
00:15:08,800 --> 00:15:10,380
So that's definitely true.

248
00:15:11,120 --> 00:15:13,780
I think most of the time we actually
do the correct decision

249
00:15:13,900 --> 00:15:19,940
like for regular queries, because
the selectivity where we would

250
00:15:19,940 --> 00:15:22,860
have a problem is kind of like
in the middle, right, somewhere.

251
00:15:23,040 --> 00:15:26,340
And usually queries do have like
very low or very high selectivity,

252
00:15:26,580 --> 00:15:27,080
something.

253
00:15:27,440 --> 00:15:31,080
So in between is not very common,
it can happen.

254
00:15:31,920 --> 00:15:35,580
But I don't really have a perfect
solution to that.

255
00:15:36,280 --> 00:15:42,540
Extended statistics definitely
can help, but it's still the very

256
00:15:42,540 --> 00:15:43,980
early decision.

257
00:15:44,240 --> 00:15:46,980
It's going to help with improving
that in some cases.

258
00:15:48,240 --> 00:15:53,160
And I know you had a whole podcast
about that.

259
00:15:54,120 --> 00:15:58,940
The problem with the extended statistics
is that it still is

260
00:15:58,940 --> 00:16:03,940
like an extremely simplified representation,
approximation of

261
00:16:03,940 --> 00:16:04,580
the data.

262
00:16:05,740 --> 00:16:09,900
That's the whole point of having
statistics, to have a very compact

263
00:16:10,320 --> 00:16:16,000
thing, very compact data set, which
you can use to do the planning

264
00:16:16,000 --> 00:16:16,500
quickly.

265
00:16:18,040 --> 00:16:24,780
Which means that it definitely
is missing some of the details.

266
00:16:25,440 --> 00:16:28,000
And that can be fatal, right?

267
00:16:28,280 --> 00:16:32,520
So I think what we need to focus
on in the future is kind of

268
00:16:32,520 --> 00:16:37,760
like making the execution part
a bit more robust, right?

269
00:16:37,760 --> 00:16:42,100
Like moving some of the decisions
to that instead of expecting

270
00:16:42,340 --> 00:16:48,920
the decisions in planning to be
perfectly correct all the time,

271
00:16:48,940 --> 00:16:54,960
to kind of like allow the execution
to actually recover from

272
00:16:54,960 --> 00:16:55,460
mistakes.

273
00:16:56,660 --> 00:16:58,700
Nikolay: Change the mind on the
fly, right?

274
00:16:59,760 --> 00:17:01,760
Change the execution path.

275
00:17:01,820 --> 00:17:02,840
Tomas: Kind of, right?

276
00:17:02,840 --> 00:17:06,800
Or like, yes, there are different
approaches to that.

277
00:17:07,200 --> 00:17:13,380
Some are kind of like difficult
to do correctly, like replanning,

278
00:17:13,520 --> 00:17:14,740
for example, right?

279
00:17:16,240 --> 00:17:21,180
Or you could have like a typical
problem for a planner, it's

280
00:17:21,180 --> 00:17:23,540
like a sequence of nested loops,
right?

281
00:17:23,600 --> 00:17:28,000
When you have underestimated the
join size or like a condition

282
00:17:28,200 --> 00:17:33,420
selectivity, then you pick a nested
loop and then it explodes

283
00:17:33,480 --> 00:17:34,340
and takes forever.

284
00:17:34,900 --> 00:17:42,680
So there are approaches which kind
of blend 2 different paths

285
00:17:42,680 --> 00:17:47,120
through the join, and either do
like a hash join or like a nested

286
00:17:47,120 --> 00:17:47,860
loop, right?

287
00:17:49,220 --> 00:17:53,980
But there are like practical issues,
what you can do limiting

288
00:17:54,020 --> 00:17:56,260
the behavior of the optimizer,
right?

289
00:17:56,320 --> 00:18:00,260
Because if you want to do this,
then for example, you cannot

290
00:18:00,260 --> 00:18:06,940
emit any tuples from that join
until you know that that's the

291
00:18:06,940 --> 00:18:08,320
correct approach, right?

292
00:18:08,480 --> 00:18:11,420
Because then you wouldn't be actually
able to flip to the other,

293
00:18:12,380 --> 00:18:13,340
and so on.

294
00:18:13,640 --> 00:18:17,140
Nikolay: And I see Another challenge,
it's like user experience.

295
00:18:17,220 --> 00:18:24,020
We all are very, we're scared about
the probability of plan flips

296
00:18:24,160 --> 00:18:25,140
sometimes, right?

297
00:18:25,140 --> 00:18:29,740
And it's dangerous, especially
when you do upgrade, major upgrade,

298
00:18:30,820 --> 00:18:32,340
Plan flips are not uncommon.

299
00:18:33,660 --> 00:18:39,560
And I know, for example, AWS Aurora,
RDS Aurora, has extension

300
00:18:39,920 --> 00:18:45,820
which is not open to freeze plans
so no flips can be possible.

301
00:18:46,160 --> 00:18:51,100
What you described, like if I see
some plan in explain, but then

302
00:18:51,340 --> 00:18:57,220
during execution plan can be changed,
it makes like my head thinking,

303
00:18:57,980 --> 00:19:00,900
like how can I like control it?

304
00:19:00,900 --> 00:19:03,420
I'm not controlling, I just understand
what's happening, right,

305
00:19:03,420 --> 00:19:09,120
Because explain will lie in this
case, right?

306
00:19:09,120 --> 00:19:15,960
Or maybe you see something that
could help to see 2 plans and

307
00:19:15,960 --> 00:19:19,340
explain, or 3 plans and we know
that.

308
00:19:19,340 --> 00:19:25,240
Tomas: But we already can show
this kind of stuff in the plan,

309
00:19:25,240 --> 00:19:25,740
right?

310
00:19:26,140 --> 00:19:32,060
Where when you have a sub-plan,
we can either have a regular

311
00:19:32,140 --> 00:19:34,340
plan or a hashed sub-plan, I think.

312
00:19:34,640 --> 00:19:38,320
So we can show alternative plans.

313
00:19:38,860 --> 00:19:40,580
I don't think that would be a problem.

314
00:19:40,960 --> 00:19:44,860
Obviously, that wouldn't tell you
what exactly happened during

315
00:19:44,860 --> 00:19:45,360
execution.

316
00:19:46,640 --> 00:19:49,020
It would need some additional improvements.

317
00:19:50,140 --> 00:19:54,640
But yes, it adds ambiguity to the
explained plan, right?

318
00:19:54,740 --> 00:19:59,180
Because it would tell you, these
are the possibilities, right?

319
00:20:00,060 --> 00:20:03,300
You don't know what's going to
happen for, and it could actually

320
00:20:04,160 --> 00:20:08,540
differ between executions of the
same query even, right?

321
00:20:09,020 --> 00:20:14,980
But that's the price for higher
robustness, right?

322
00:20:15,100 --> 00:20:15,600
Nikolay: Right.

323
00:20:16,120 --> 00:20:21,900
I'm just thinking observability
tools, many people want them

324
00:20:21,900 --> 00:20:23,540
to be improved, like to track...

325
00:20:23,620 --> 00:20:28,100
pg_stat_statements is not enough to
have plans, 1 normalized query

326
00:20:28,100 --> 00:20:29,360
can have multiple plans.

327
00:20:29,640 --> 00:20:33,180
In the past, there were attempts
to have pg_stat_plans or something.

328
00:20:33,400 --> 00:20:36,800
There are also extensions which
allow you to see the plan in

329
00:20:37,280 --> 00:20:39,040
for ongoing query, right?

330
00:20:39,240 --> 00:20:47,960
I think if this way is like this
direction is going to be like

331
00:20:48,240 --> 00:20:51,980
norm to alterate the plan on the
fly that observability tooling

332
00:20:51,980 --> 00:20:55,800
needs improvement improvements
for sure because we need to understand

333
00:20:55,800 --> 00:20:59,640
what's happening and auto_explain
as well right probably it should

334
00:20:59,760 --> 00:21:05,140
report initial plan and then additional
plan or something like

335
00:21:05,140 --> 00:21:06,420
this, alternations.

336
00:21:07,540 --> 00:21:12,500
So I just feel the bigger need
to improve observability part

337
00:21:12,500 --> 00:21:13,840
of Postgres as well.

338
00:21:16,100 --> 00:21:16,600
Tomas: Yes.

339
00:21:17,460 --> 00:21:19,620
I have nothing against observability.

340
00:21:20,140 --> 00:21:21,960
Nikolay: Not a question, just feeling.

341
00:21:21,980 --> 00:21:28,320
Tomas: And I think I haven't tried
like pg_stat_plans for a while,

342
00:21:28,320 --> 00:21:31,940
but I think it still exists, it's
still available, right?

343
00:21:32,920 --> 00:21:34,140
I think it's very useful.

344
00:21:34,300 --> 00:21:41,480
I think there's also a patch to
actually add ability to watch

345
00:21:41,480 --> 00:21:43,760
a plan as it's being executed,
right?

346
00:21:43,980 --> 00:21:50,020
Which can actually help, that would
be a tremendous benefit when

347
00:21:50,020 --> 00:21:54,640
investigating queries that never
complete, where you can't actually

348
00:21:54,640 --> 00:22:00,360
get like explain analyze and so
on, or just watch the execution

349
00:22:00,360 --> 00:22:02,060
of a query in another backend.

350
00:22:02,220 --> 00:22:03,640
So there is a patch for that.

351
00:22:03,640 --> 00:22:08,260
It's definitely not going to be in Postgres 18, but I think Rafael

352
00:22:08,400 --> 00:22:09,940
actually is working on that.

353
00:22:10,520 --> 00:22:12,840
I hope I didn't get the name wrong.

354
00:22:13,180 --> 00:22:18,580
Yeah, but again, I don't think that's necessarily about the performance

355
00:22:19,280 --> 00:22:20,400
only, right?

356
00:22:20,460 --> 00:22:25,020
That's about monitoring in general.

357
00:22:26,140 --> 00:22:28,680
Michael: This concept of robustness is really interesting.

358
00:22:29,440 --> 00:22:33,780
I don't think I'd come across it as defined a term until reading

359
00:22:33,780 --> 00:22:38,400
some of the papers that you link to at the end of your talk and

360
00:22:38,400 --> 00:22:39,500
I found it fascinating.

361
00:22:39,520 --> 00:22:43,340
I think it aligns well with my experience of what people really

362
00:22:43,340 --> 00:22:44,440
care about in performance.

363
00:22:44,480 --> 00:22:47,900
I think that, Correct me if I'm wrong, but I got the impression

364
00:22:47,900 --> 00:22:54,280
it's really about trying to avoid really bad worst case situations

365
00:22:54,380 --> 00:22:59,480
at the cost of potentially slightly worse average execution time,

366
00:22:59,480 --> 00:23:01,820
for example, if we want to optimize based on time.

367
00:23:02,000 --> 00:23:06,340
So this idea that on average, we might perform slightly worse,

368
00:23:06,340 --> 00:23:10,120
but our worst case will be nowhere near as bad as the, so the

369
00:23:10,120 --> 00:23:14,760
P99 would be way better or way less severe, but the P50 might

370
00:23:14,760 --> 00:23:15,700
be a bit higher.

371
00:23:16,080 --> 00:23:17,260
Is that about right?

372
00:23:18,160 --> 00:23:21,000
Tomas: Yeah, I think that's exactly right.

373
00:23:21,460 --> 00:23:26,980
You're exchanging the, you know, worse average performance for

374
00:23:27,980 --> 00:23:31,940
having to do less investigations of performance issues.

375
00:23:31,960 --> 00:23:34,200
And I think that's perfectly reasonable trade-off.

376
00:23:34,200 --> 00:23:38,740
I mean, I do like working on micro-optimizations and micro-benchmarks

377
00:23:39,060 --> 00:23:39,740
and everything.

378
00:23:40,080 --> 00:23:40,820
That's great.

379
00:23:41,680 --> 00:23:45,360
But in the end, someone needs to be using the database, right?

380
00:23:46,620 --> 00:23:47,580
That's the point.

381
00:23:48,280 --> 00:23:55,020
And if they do have, I don't know, until people had like 1 database

382
00:23:55,560 --> 00:23:58,940
and they cared about that only, that was fine.

383
00:23:59,720 --> 00:24:03,120
As soon as you have like a larger fleet of database servers that

384
00:24:03,120 --> 00:24:08,120
are actually used by people or by applications or something,

385
00:24:09,060 --> 00:24:15,080
you definitely will value robustness, like not having issues,

386
00:24:15,720 --> 00:24:21,300
even if the peak performance is maybe 25% lower or something

387
00:24:21,300 --> 00:24:22,000
like that.

388
00:24:22,420 --> 00:24:26,700
Because it's much easier to spend a bit more on the hardware

389
00:24:28,140 --> 00:24:35,680
than having to run a large team investigating and fixing issues

390
00:24:35,680 --> 00:24:38,260
and firefighting all the time.

391
00:24:38,300 --> 00:24:40,740
That just doesn't work at scale.

392
00:24:41,580 --> 00:24:46,620
I think it's also what some of the Cloud hosting providers realize.

393
00:24:50,580 --> 00:24:54,520
It's a bit against my nature because I definitely do want to

394
00:24:54,520 --> 00:24:58,000
do the best performance possible.

395
00:25:00,160 --> 00:25:04,840
I've been benchmarking a lot and doing performance-related patches

396
00:25:04,840 --> 00:25:05,940
for a long time.

397
00:25:06,600 --> 00:25:10,580
At some point, I think it's kind of pointless, right?

398
00:25:10,900 --> 00:25:13,820
Because people will not be actually able to use that in production

399
00:25:13,900 --> 00:25:14,400
anyway.

400
00:25:14,960 --> 00:25:15,460
Michael: Yeah.

401
00:25:15,900 --> 00:25:16,960
I mean, there are definitely...

402
00:25:16,960 --> 00:25:19,540
It feels like there are cases and
there are people trying to

403
00:25:19,540 --> 00:25:23,240
get the best of both worlds right
trying to get a faster than

404
00:25:23,240 --> 00:25:27,680
we currently have solution that
is also more robust like we had

405
00:25:28,140 --> 00:25:31,920
we had Peter Geoghegan on to talk about
the work he's been doing

406
00:25:31,920 --> 00:25:36,060
on like in the in the direction
of skip scan, but that that work

407
00:25:36,060 --> 00:25:41,320
seems to make some decisions post
planning execution as to whether

408
00:25:41,320 --> 00:25:46,580
it's going to skip or continue
trying to skip or just run through

409
00:25:46,580 --> 00:25:49,920
the index sequentially depending
on certain statistics it gathers

410
00:25:49,920 --> 00:25:50,740
along the way.

411
00:25:50,740 --> 00:25:54,780
That strikes me as work in this
direction that in the real world

412
00:25:54,780 --> 00:25:59,160
doesn't seem to give up too much
on the optimal path either.

413
00:26:00,660 --> 00:26:04,100
Tomas: So I think you are talking
about the multi-dimensional

414
00:26:04,500 --> 00:26:06,940
access method that Peter was talking
about.

415
00:26:08,420 --> 00:26:11,520
And I think it's actually a win-win
situation in this case, right,

416
00:26:11,520 --> 00:26:16,120
because he actually can do better
decisions based on the a priori

417
00:26:16,120 --> 00:26:19,200
information that he has at the
planning stage, right?

418
00:26:19,200 --> 00:26:22,360
So he actually is not sacrificing
anything.

419
00:26:22,360 --> 00:26:25,780
He actually, he's improving the
behavior in many cases.

420
00:26:25,840 --> 00:26:29,560
So I think it's amazing what he's
doing with indexes.

421
00:26:31,120 --> 00:26:34,540
And I've been grateful to actually
collaborate with him on the

422
00:26:34,540 --> 00:26:35,900
index prefetching patch.

423
00:26:36,040 --> 00:26:38,260
So, and his comments were very
useful.

424
00:26:39,280 --> 00:26:43,140
And I think it's actually Peter
Geoghegan who pointed out like a

425
00:26:43,140 --> 00:26:48,000
quote from Goetz Graefe, who is
one of the authors of the patches,

426
00:26:48,680 --> 00:26:51,540
who said that choice is confusion.

427
00:26:52,580 --> 00:26:58,540
By adding more different ways to
execute a query to the optimizer,

428
00:26:59,540 --> 00:27:04,740
then you are just giving it more
chance to make a mistake, right?

429
00:27:04,960 --> 00:27:13,480
So I think the robustness, like
the main parts of the features

430
00:27:13,480 --> 00:27:17,460
that improve robustness needs to
be at the execution path, right?

431
00:27:17,520 --> 00:27:24,500
Kind of like adjusting based on
the actual data, right?

432
00:27:25,760 --> 00:27:30,720
Michael: Yeah, so we've got probably
more than this, but 3 main

433
00:27:30,720 --> 00:27:35,020
times we're having to make, or
the planners having to make decisions

434
00:27:35,020 --> 00:27:36,960
that are really impactful for plan
flips.

435
00:27:36,960 --> 00:27:41,520
We've got scan type, so we've got
kind of sequential scan, index

436
00:27:41,520 --> 00:27:45,480
scan, bitmap scan, index only scan,
perhaps some others if you've

437
00:27:45,480 --> 00:27:46,820
got extensions installed.

438
00:27:48,060 --> 00:27:51,540
And then we've got join algorithm.

439
00:27:52,680 --> 00:27:54,740
And then we've also got join order.

440
00:27:56,000 --> 00:27:59,340
I've read a bit about from based
on your talk, I've read a bit

441
00:27:59,340 --> 00:28:04,300
about the first 2, but nothing
about the last 1 in terms of execution

442
00:28:04,540 --> 00:28:06,100
time optimizations.

443
00:28:06,680 --> 00:28:10,420
Is there anything around that that
you'd be looking into?

444
00:28:11,780 --> 00:28:15,220
Tomas: So the first 2 parts that
you mentioned, that's like,

445
00:28:16,020 --> 00:28:19,840
there are actually, you know, papers
proposing like more robust,

446
00:28:20,640 --> 00:28:24,440
on average slower, but more robust
execution paths.

447
00:28:25,580 --> 00:28:30,620
I think that would be interesting
to have some of those as extensions

448
00:28:30,720 --> 00:28:31,420
in Postgres.

449
00:28:32,220 --> 00:28:36,420
Because we do have the custom scan
interface and we can actually

450
00:28:37,360 --> 00:28:38,900
inject a different path.

451
00:28:38,960 --> 00:28:40,180
So that would be fine.

452
00:28:40,800 --> 00:28:44,980
For the join order search, I'm
not sure.

453
00:28:45,060 --> 00:28:49,700
I'm aware of some issues in that.

454
00:28:50,500 --> 00:28:55,840
But I think for the typical queries,
the algorithm actually works

455
00:28:55,840 --> 00:28:56,700
quite fine.

456
00:28:57,280 --> 00:29:03,040
But there are some examples of
queries where that actually blows

457
00:29:03,040 --> 00:29:03,540
up.

458
00:29:04,300 --> 00:29:08,560
And I don't know if that's what
you've been pointing at, but

459
00:29:08,560 --> 00:29:14,060
like StarJoin, for example, is
a great example, because we end

460
00:29:14,060 --> 00:29:18,160
up exploring n factorial of different
plans.

461
00:29:19,140 --> 00:29:20,820
And we can actually improve that.

462
00:29:20,920 --> 00:29:24,400
I actually posted like a proof
of concept patch for improving

463
00:29:24,400 --> 00:29:25,100
this case.

464
00:29:25,580 --> 00:29:30,700
But there's a lot of work that
needs to be done on that.

465
00:29:31,080 --> 00:29:36,300
But that could actually help tremendously
because I think in

466
00:29:36,300 --> 00:29:41,280
another talk, I compared like performance
for different kinds

467
00:29:41,280 --> 00:29:44,440
of workloads since like Postgres
8, I think.

468
00:29:45,400 --> 00:29:49,600
And on like a star join with, I
mean, OLTP star join, which means

469
00:29:49,600 --> 00:29:54,020
like you select 1 row from, or
a small number of rows from the

470
00:29:54,020 --> 00:29:58,940
fact table, and then have like
20 dimensions, you know, joining

471
00:29:58,940 --> 00:29:59,440
to.

472
00:30:00,480 --> 00:30:04,160
So for that, we didn't actually
improve throughput at all since

473
00:30:04,160 --> 00:30:05,040
Postgres 8.

474
00:30:05,280 --> 00:30:05,640
Right?

475
00:30:05,640 --> 00:30:06,140
Michael: Wow.

476
00:30:06,220 --> 00:30:12,500
Tomas: Which is like crazy, like
compared to the order or 2 orders

477
00:30:12,500 --> 00:30:16,200
of magnitude better throughput
for the other OLTP workloads.

478
00:30:17,080 --> 00:30:19,940
I think that's a clear bottleneck
there.

479
00:30:20,020 --> 00:30:25,580
And it's exactly because the join
order search is like not great.

480
00:30:26,040 --> 00:30:29,880
We do have something we call genetic
optimization, but I think

481
00:30:30,160 --> 00:30:35,580
almost no 1 actually uses that
because the default configuration

482
00:30:35,740 --> 00:30:40,380
parameters actually force us to
do a different thing.

483
00:30:40,760 --> 00:30:44,280
So we don't actually use the genetic
optimizer at all.

484
00:30:45,300 --> 00:30:49,120
Nikolay: Well, when we have 20
tables joined, it's used, right?

485
00:30:49,120 --> 00:30:50,140
Because there's threshold.

486
00:30:50,440 --> 00:30:51,600
Tomas: No, no, no, no, no, no,
no.

487
00:30:51,600 --> 00:30:55,940
Because first we split that at
8 into smaller join searches,

488
00:30:56,320 --> 00:30:56,820
right?

489
00:30:56,820 --> 00:30:58,700
And we do the regular algorithm
for that.

490
00:30:58,700 --> 00:30:59,180
Nikolay: I see.

491
00:30:59,180 --> 00:31:02,860
Tomas: So you would actually need
to change the defaults, and

492
00:31:02,860 --> 00:31:04,900
only then would the genetic algorithm
be used.

493
00:31:04,900 --> 00:31:05,980
Nikolay: I didn't know this.

494
00:31:05,980 --> 00:31:06,480
Okay.

495
00:31:07,120 --> 00:31:07,620
Tomas: Yeah.

496
00:31:07,780 --> 00:31:08,280
Good.

497
00:31:08,560 --> 00:31:12,880
I didn't know that either because
that's when I started doing

498
00:31:12,900 --> 00:31:15,880
the benchmarks to compare the different
versions.

499
00:31:17,120 --> 00:31:23,460
I was wondering, why is the genetic
optimizer not actually visible

500
00:31:23,460 --> 00:31:27,880
anywhere right and it's because
of this so people don't use that.

501
00:31:28,620 --> 00:31:32,420
Michael: When you enabled it, did it improve on later versions

502
00:31:32,840 --> 00:31:34,260
versus version 8?

503
00:31:35,080 --> 00:31:36,140
Tomas: I think it did.

504
00:31:36,380 --> 00:31:36,820
Okay.

505
00:31:36,820 --> 00:31:39,260
But I don't remember the details.

506
00:31:39,280 --> 00:31:42,780
I haven't like included that in the charts at all, for example.

507
00:31:42,940 --> 00:31:44,140
But that's a good question.

508
00:31:44,140 --> 00:31:45,040
I don't know.

509
00:31:45,800 --> 00:31:48,740
Michael: Thinking about like next steps for Postgres.

510
00:31:48,740 --> 00:31:51,740
What would you be most excited to see people working on or what

511
00:31:51,740 --> 00:31:54,780
are you looking to work on next that you're excited about?

512
00:31:55,760 --> 00:32:00,600
Tomas: So I definitely would like to see someone working on the

513
00:32:00,600 --> 00:32:07,200
robustness patches for either the SmoothScan, I do have a proof

514
00:32:07,200 --> 00:32:13,020
of concept patch for that, or the G-Join, I think they call it.

515
00:32:13,260 --> 00:32:16,960
That's a robust algorithm for joins.

516
00:32:17,860 --> 00:32:23,240
I think that would be extremely useful and I'm 99% sure it could

517
00:32:23,240 --> 00:32:24,560
be done as an extension.

518
00:32:25,440 --> 00:32:30,780
So that makes it much easier to hack on without breaking everything

519
00:32:30,820 --> 00:32:31,320
else.

520
00:32:31,400 --> 00:32:35,820
And it also makes it immediately available for users, which I

521
00:32:35,820 --> 00:32:38,100
think is, that's very useful.

522
00:32:39,380 --> 00:32:44,440
And then I think there are actually some like smaller patches

523
00:32:44,440 --> 00:32:48,880
where people could improve like individual performance clips,

524
00:32:48,920 --> 00:32:49,420
right?

525
00:32:50,080 --> 00:32:54,940
If I go back to the example that I used in the talk, I can imagine

526
00:32:55,080 --> 00:33:02,420
people experimenting with choosing the flip, the number at which

527
00:33:02,420 --> 00:33:06,000
we flip from a linear search to a hash search, right?

528
00:33:06,100 --> 00:33:11,140
So I can imagine people doing some runtime measurements and like

529
00:33:11,280 --> 00:33:12,100
doing that.

530
00:33:12,940 --> 00:33:16,720
I think another like problem for robustness is adjusting time

531
00:33:16,720 --> 00:33:18,480
compilation for us, right?

532
00:33:18,680 --> 00:33:23,740
Which adds like a completely different level of robustness issues,

533
00:33:23,960 --> 00:33:25,620
like unpredictable behavior.

534
00:33:26,400 --> 00:33:27,180
Nikolay: We just switched

535
00:33:27,180 --> 00:33:27,720
Tomas: it off.

536
00:33:27,720 --> 00:33:28,660
Nikolay: We just switched it off.

537
00:33:28,660 --> 00:33:29,340
Michael: Yeah, yeah.

538
00:33:29,340 --> 00:33:30,360
But that's great.

539
00:33:31,420 --> 00:33:36,500
Tomas: Yes, I think that's the default recommendation, but it's

540
00:33:36,500 --> 00:33:38,420
also quite sad, right?

541
00:33:38,420 --> 00:33:42,480
Because the just-in-time compilation can be really helpful for

542
00:33:42,480 --> 00:33:43,220
some queries.

543
00:33:44,380 --> 00:33:46,170
Michael: But it's a robustness decision, right?

544
00:33:46,170 --> 00:33:46,360
Yes.

545
00:33:46,360 --> 00:33:51,780
Like Nikolay is often, and I've seen people as well, choosing

546
00:33:52,080 --> 00:33:56,420
potentially worse on average performance in certain queries for

547
00:33:56,520 --> 00:33:58,060
not getting those outliers.

548
00:33:58,200 --> 00:34:00,040
So it's exactly about robustness.

549
00:34:00,780 --> 00:34:06,440
Tomas: Yes, I think that's a great example showcasing the decisions

550
00:34:06,440 --> 00:34:07,540
people make, right?

551
00:34:08,000 --> 00:34:12,980
And they almost always choose better
robustness if that's a production

552
00:34:12,980 --> 00:34:13,980
workload, right?

553
00:34:15,580 --> 00:34:17,240
I think that's fine, right?

554
00:34:17,940 --> 00:34:20,340
And I think we could have like
a...

555
00:34:20,340 --> 00:34:23,660
I don't know if you had a session
about just-in-time compilation,

556
00:34:24,020 --> 00:34:27,900
but I think there's a whole set
of things people could do to

557
00:34:27,900 --> 00:34:28,820
improve that.

558
00:34:29,160 --> 00:34:34,240
It could be better planning or
it could be using a different

559
00:34:34,440 --> 00:34:36,700
library for JIT.

560
00:34:37,080 --> 00:34:40,540
Because a lot of that, a lot of
the problems that we have is

561
00:34:40,840 --> 00:34:48,060
actually due to LLVM not being
a well-suited library for the

562
00:34:48,060 --> 00:34:51,000
kind of JIT that we do for the
use case, right?

563
00:34:51,940 --> 00:34:52,780
Michael: Yeah, interesting.

564
00:34:52,860 --> 00:34:54,240
We haven't done an episode on it.

565
00:34:54,240 --> 00:34:56,540
I think you're right, it would
be a good 1 to do.

566
00:34:57,180 --> 00:35:00,040
Who would you recommend we talk
to about that?

567
00:35:00,220 --> 00:35:02,300
Is it Andres or who is it?

568
00:35:02,480 --> 00:35:06,440
Tomas: I think Andres is like the
1 person who knows about that

569
00:35:06,440 --> 00:35:09,140
most because I think he's the 1
implementing that.

570
00:35:09,140 --> 00:35:12,440
So you can also complain to him
and he's the right person.

571
00:35:13,520 --> 00:35:17,860
But I think there are actually
other people who submitted some

572
00:35:17,860 --> 00:35:19,640
patches for like improvements for
like

573
00:35:19,640 --> 00:35:20,140
Nikolay: cool

574
00:35:20,160 --> 00:35:24,640
Tomas: I think that was like 2
years ago there was a blog post

575
00:35:24,720 --> 00:35:30,080
about someone actually implementing
like custom JIT and that

576
00:35:30,080 --> 00:35:33,960
might be another good guest I think
I

577
00:35:34,080 --> 00:35:37,580
Michael: completely forgot about
that that's a good idea and

578
00:35:37,580 --> 00:35:41,260
the the other thing is I think
it's based solely around like

579
00:35:41,480 --> 00:35:43,360
a limit of, a cost limit.

580
00:35:43,820 --> 00:35:46,060
That feels very crude to me.

581
00:35:46,500 --> 00:35:47,000
Tomas: Yes.

582
00:35:47,420 --> 00:35:50,720
As I say, like the planning is
very, very simplistic.

583
00:35:51,220 --> 00:35:55,960
It makes like essentially the decision
for the whole query at

584
00:35:55,960 --> 00:35:57,340
once, right?

585
00:35:57,440 --> 00:35:57,940
Michael: Yeah.

586
00:35:57,980 --> 00:36:00,900
Tomas: I mean like the fact that
it's based on like cost limited,

587
00:36:00,980 --> 00:36:08,500
that's roughly fine, but then crossing
the whole limit, crossing

588
00:36:08,500 --> 00:36:13,580
the cost limit for the whole query,
can also enable just-in-time

589
00:36:13,780 --> 00:36:17,300
compilation in parts of the query
where that's not really useful,

590
00:36:17,360 --> 00:36:17,860
right?

591
00:36:18,080 --> 00:36:22,480
So that's definitely 1 of the things
that we could improve.

592
00:36:22,480 --> 00:36:26,480
I mean, like it's not a perfect
solution for all the cases, but

593
00:36:26,480 --> 00:36:29,000
it would be, I think, better approach.

594
00:36:29,380 --> 00:36:32,300
But I'm sure like there are reasons
why it was done like this

595
00:36:32,300 --> 00:36:38,600
right so I don't want to like make
like a definitive statements

596
00:36:38,720 --> 00:36:40,800
about like solutions.

597
00:36:42,180 --> 00:36:44,680
Michael: Yeah would be an interesting
discussion and was there

598
00:36:44,680 --> 00:36:46,960
anything you want anything we should
have asked about that we

599
00:36:46,960 --> 00:36:47,460
didn't?

600
00:36:47,680 --> 00:36:51,240
Tomas: No, I think we discussed
like a lot of different like

601
00:36:51,340 --> 00:36:56,360
parts of the that I mentioned in the talk so I think that's fine.

602
00:36:56,680 --> 00:36:59,060
Michael: It's a really excellent talk and we didn't we definitely

603
00:36:59,060 --> 00:37:01,960
didn't discuss it all and it comes across really well with examples.

604
00:37:01,960 --> 00:37:05,700
So I will include the link to the talk and the slides as well

605
00:37:06,280 --> 00:37:07,240
for those links.

606
00:37:07,440 --> 00:37:10,160
And I'll include a link to the San Fran talk.

607
00:37:10,160 --> 00:37:14,540
So if anybody's got questions themselves, I guess you're attending

608
00:37:14,540 --> 00:37:17,540
live so they can ask them at the meetup as well.

609
00:37:17,800 --> 00:37:21,540
Tomas: Yeah, or they can just like send me an email or something.

610
00:37:21,820 --> 00:37:27,040
I also do something I call office hours, where people can just

611
00:37:27,040 --> 00:37:29,560
let me know what they would like to talk about.

612
00:37:30,180 --> 00:37:33,720
And it could be like a badge they are considering or whatever.

613
00:37:34,280 --> 00:37:38,200
And we can definitely like schedule call or something.

614
00:37:38,560 --> 00:37:41,640
That's also what I do for the community.

615
00:37:41,760 --> 00:37:42,260
Yes.

616
00:37:42,540 --> 00:37:43,260
Michael: Yeah, that's awesome.

617
00:37:43,260 --> 00:37:45,680
I did actually want to say thank you for everything you're doing,

618
00:37:45,680 --> 00:37:49,140
because you're also doing some mentoring and some patch ideas

619
00:37:49,280 --> 00:37:51,400
for new potential committers.

620
00:37:51,540 --> 00:37:56,200
And I think that that kind of work is very, very valued and very,

621
00:37:56,200 --> 00:37:56,880
very valuable.

622
00:37:56,880 --> 00:37:57,840
So thank you.

623
00:37:59,920 --> 00:38:00,420
Awesome.

624
00:38:00,420 --> 00:38:02,440
Well, thanks so much for joining us, Tomas.

625
00:38:02,440 --> 00:38:04,460
And yeah, I hope you have a great week.

626
00:38:04,760 --> 00:38:06,360
Tomas: Bye.