1
00:00:05,340 --> 00:00:10,160
[CLAIRE] Welcome to Talking Postgres, a monthly podcast for developers who love this database.

2
00:00:10,700 --> 00:00:15,540
I'm your host, Claire Giordano, and in this pod, we explore the human side of Postgres,

3
00:00:15,980 --> 00:00:21,240
databases, and open source, which means why do people who work with Postgres do what they do,

4
00:00:21,800 --> 00:00:26,340
and how did they get there? Thank you to the team at Microsoft for sponsoring this

5
00:00:26,360 --> 00:00:33,560
community conversation. Today's guest is Tomas Vondra. He is a Postgres major contributor,

6
00:00:34,020 --> 00:00:40,720
a committer. He works on the Postgres team at Microsoft. He's very involved with the Prague

7
00:00:41,360 --> 00:00:48,500
local user group and is one of the organizers of the, let's see if I get it right, the Prague

8
00:00:48,720 --> 00:00:56,040
Postgres Developer Day, which I just call P2D2. He is well known for his work on performance

9
00:00:56,840 --> 00:01:03,280
in the Postgres project, and that led to today's topic as well. Welcome, Tomas.

10
00:01:04,800 --> 00:01:06,800
[TOMAS] Hello. Happy to be here.

11
00:01:07,720 --> 00:01:13,160
[CLAIRE] I'm really glad you're here. So today's topic is why it's fun to hack on Postgres performance.

12
00:01:13,580 --> 00:01:20,240
And that is because, fairly or unfairly, you have a reputation for working on the performance of Postgres.

13
00:01:20,700 --> 00:01:34,300
So hopefully we can dive into why it is that you do that and hopefully get some, I don't know, tips, suggestions, ideas, insights that will benefit other people in their work with Postgres.

14
00:01:35,820 --> 00:01:39,860
But before we do any of that, I always like to start with an origin story.

15
00:01:40,780 --> 00:01:54,180
Particularly if people are listening who are still getting their schooling or training and are trying to figure out what they're going to do with their career, I think it's always interesting to hear from people about how they got their start.

16
00:01:57,180 --> 00:01:57,560
[TOMAS] Yeah, sure.

17
00:01:58,920 --> 00:02:02,280
I guess you want me to explain how I got to work on Postgres

18
00:02:02,320 --> 00:02:06,460
and how I got to work on performance stuff, right?

19
00:02:08,259 --> 00:02:13,280
So I think I already told this story multiple times,

20
00:02:13,340 --> 00:02:15,500
but maybe only to individual people.

21
00:02:17,240 --> 00:02:22,100
I think in 2004, 2003 or something like that,

22
00:02:22,660 --> 00:02:24,200
I've been still at the university.

23
00:02:24,820 --> 00:02:29,120
And I've been studying software engineering and that kind of stuff.

24
00:02:30,660 --> 00:02:33,960
And as a part-time job, I've been working at a company

25
00:02:35,380 --> 00:02:39,580
that was operating a couple e-commerce websites

26
00:02:40,100 --> 00:02:42,580
because that was after the dot-com boom.

27
00:02:43,240 --> 00:02:48,100
So a lot of the commerce was already done on the internet.

28
00:02:49,160 --> 00:02:52,380
And they've been running a couple pharmacies and that kind of stuff.

29
00:02:53,580 --> 00:03:01,640
And most of that was operated on actually MySQL and PHP and like pretty basic stuff.

30
00:03:02,320 --> 00:03:09,180
And I think we had at some point like serious performance issues.

31
00:03:09,500 --> 00:03:20,260
Essentially, the websites couldn't actually keep up with even just like a couple dozen customers visiting the website.

32
00:03:21,820 --> 00:03:27,800
And the management came with the request: we need to solve that.

33
00:03:28,820 --> 00:03:34,300
And one of the options was to just buy a commercial database like Oracle.

34
00:03:35,320 --> 00:03:37,400
But that was extremely expensive, right?

35
00:03:38,240 --> 00:03:42,160
Even then, it was a very small company.

36
00:03:42,560 --> 00:03:48,960
And we couldn't actually afford, I think, that amount of money.

37
00:03:49,020 --> 00:03:51,440
Or it seemed like unwise investment.

38
00:03:51,500 --> 00:03:58,840
So at some point we decided to investigate what are the other options?

39
00:04:00,620 --> 00:04:02,500
What other databases could we use?

40
00:04:04,540 --> 00:04:08,500
And we actually realized there is another database,

41
00:04:09,020 --> 00:04:15,700
which some people suggested was like a better for complex queries and so on.

42
00:04:16,560 --> 00:04:17,340
And that was Postgres.

43
00:04:17,660 --> 00:04:25,480
So I think at that point it was Postgres 7.4 or maybe 8.0 or something like that.

44
00:04:26,420 --> 00:04:28,320
So we gave that a try.

45
00:04:28,460 --> 00:04:34,260
We essentially migrated the website and everything to Postgres

46
00:04:34,360 --> 00:04:39,700
because we needed to see how that actually performed on that database.

47
00:04:41,300 --> 00:04:45,779
And the funny story was that at that point we found

48
00:04:45,800 --> 00:04:49,800
that actually it wasn't caused by the database at all.

49
00:04:50,520 --> 00:04:55,500
It was just like a really poorly written application in PHP

50
00:04:56,700 --> 00:05:02,240
because it was the worst pattern in the code

51
00:05:02,400 --> 00:05:06,320
was that it first selected all the products to list on a page,

52
00:05:06,760 --> 00:05:08,960
but it only got the IDs, right?

53
00:05:09,700 --> 00:05:10,900
And then for each product,

54
00:05:11,520 --> 00:05:15,380
it did like individual queries for each other field it needed to

55
00:05:16,000 --> 00:05:19,840
lookup. So it was like thousands of queries per page

56
00:05:20,040 --> 00:05:23,860
instead of like a single query. And that would kill

57
00:05:24,220 --> 00:05:28,140
any database. So we changed how

58
00:05:28,580 --> 00:05:31,900
the code actually worked. But in the end

59
00:05:31,960 --> 00:05:35,840
we decided to stay with Postgres anyway. Because

60
00:05:35,900 --> 00:05:40,140
we just liked some of

61
00:05:40,140 --> 00:05:44,040
the features of Postgres which MySQL didn't have.

62
00:05:44,840 --> 00:05:50,400
And also it was like much more, I think, reliable than MySQL.

63
00:05:51,360 --> 00:05:53,960
Because at that point, MySQL only had MyISAM.

64
00:05:54,420 --> 00:05:58,160
It didn't actually have proper like transaction control and all of that.

65
00:06:01,699 --> 00:06:09,560
So we stuck with Postgres, even though it actually didn't solve the actual performance problem.

66
00:06:10,759 --> 00:06:13,740
But that's how I got to work on Postgres.

67
00:06:14,720 --> 00:06:18,620
And it's also how I started working on performance stuff.

68
00:06:19,700 --> 00:06:23,880
Because first, we've been solving a performance issue.

69
00:06:25,220 --> 00:06:27,800
That's how we got to work with Postgres at all.

70
00:06:29,060 --> 00:06:34,600
But also, after we kept using Postgres,

71
00:06:36,060 --> 00:06:38,300
we kept running into complex queries

72
00:06:38,510 --> 00:06:41,980
and queries that didn't actually perform well

73
00:06:42,120 --> 00:06:42,960
and that kind of stuff.

74
00:06:43,980 --> 00:06:52,280
And in our team, we needed someone who would actually focus on these problems.

75
00:06:53,500 --> 00:06:55,100
And it happened to be me.

76
00:06:55,300 --> 00:06:58,520
So I started working.

77
00:06:56,480 --> 00:07:00,740
[CLAIRE] So to be clear, this is a part-time job and you are still in university. [Yes]

78
00:07:03,080 --> 00:07:09,920
And so kind of in the very beginning, performance and Postgres in response to the situation. [Exactly. Yes]

79
00:07:10,520 --> 00:07:15,400
Oh, so the rest of us are so lucky that happened.

80
00:07:15,410 --> 00:07:16,780
I mean, it's kind of happenstance.

81
00:07:19,500 --> 00:07:26,660
[TOMAS] That's a good question I don't know if it was that I was forced to work on these things

82
00:07:28,120 --> 00:07:35,180
but also it was like a personal choice right like in the team someone had to do that

83
00:07:36,340 --> 00:07:40,520
but I also personally like gravitated to this kind of work.

84
00:07:40,920 --> 00:07:44,380
So it was also, if I didn't like working on these problems.

85
00:07:45,580 --> 00:07:47,080
I wouldn't be working on that at all.

86
00:07:47,580 --> 00:07:47,680
Right?

87
00:07:48,860 --> 00:07:51,580
So I wasn't forced, but I had the chance actually

88
00:07:51,700 --> 00:07:52,380
to work on that.

89
00:07:53,280 --> 00:07:58,400
[CLAIRE] Okay, speaking of gravitating to things, when we started this conversation a few minutes

90
00:07:58,680 --> 00:08:04,380
ago, I think I was pulling you away from something else that you were doing, which is watching

91
00:08:04,760 --> 00:08:06,200
the Olympic ice hockey game. [Yes]

92
00:08:07,340 --> 00:08:08,460
So what's the score?

93
00:08:09,280 --> 00:08:12,580
It's Czech Republic versus Canada. [Yes]

94
00:08:12,680 --> 00:08:13,220
Is that right?

95
00:08:13,700 --> 00:08:13,980
[TOMAS] Yes.

96
00:08:15,280 --> 00:08:16,280
I don't know.

97
00:08:17,720 --> 00:08:23,320
I think, I don't know what the score is right now. I don't know.

98
00:08:21,640 --> 00:08:23,040
[CLAIRE] Aaron is putting into the chat.

99
00:08:24,260 --> 00:08:26,060
Aaron's putting into the chat that is three, three.

100
00:08:26,700 --> 00:08:32,000
And after a three minute intermission, there will be a 10 minute overtime period of three on three hockey.

101
00:08:32,120 --> 00:08:33,240
So that's kind of exciting. [Yeah]

102
00:08:35,919 --> 00:08:38,599
Do we need to pause so you can go watch this?

103
00:08:39,419 --> 00:08:49,220
[TOMAS] No, I'm not a huge hockey fan, but also playing Canada is a big deal, I think.

104
00:08:49,580 --> 00:08:51,160
So it's fun, right?

105
00:08:51,040 --> 00:08:58,640
[CLAIRE] Yeah and our co-producer who's on the text chat right now Aaron Wislang he is from

106
00:08:58,960 --> 00:09:04,920
Canada so I can imagine that he's rooting for the other team and not aligned with you in this

107
00:09:04,980 --> 00:09:09,560
in this instance. Well, we'll keep an eye on the text chat as to what's happening and we'll find out

108
00:09:10,040 --> 00:09:17,100
whether you should celebrate or not. Okay, so you just answered the question, not only how did you.

109
00:09:17,100 --> 00:09:22,120
Get started as a developer, but how you got started with Postgres and how you got started in performance,

110
00:09:22,720 --> 00:09:24,060
all in one answer.

111
00:09:27,620 --> 00:09:29,240
So let's see.

112
00:09:31,560 --> 00:09:34,820
When you think about working on Postgres performance,

113
00:09:35,180 --> 00:09:37,800
you just said you gravitated to that kind of work then,

114
00:09:38,180 --> 00:09:41,100
but you still gravitate to that kind of work now.

115
00:09:42,040 --> 00:09:45,380
So what is it about performance that you find so interesting?

116
00:09:48,360 --> 00:09:50,060
[TOMAS] That's a good question.

117
00:09:50,550 --> 00:09:57,820
But I think I'm focusing on query planning and query execution

118
00:09:58,480 --> 00:10:02,700
because there is actually a fair amount of mathematics

119
00:10:03,280 --> 00:10:08,400
and problem-solving thing.

120
00:10:09,820 --> 00:10:13,320
And I think there's also a lot of engineering

121
00:10:19,540 --> 00:10:22,080
in making good trade-offs, right?

122
00:10:22,200 --> 00:10:24,740
When deciding how to execute a query

123
00:10:24,760 --> 00:10:28,980
or like which path to take.

124
00:10:29,120 --> 00:10:33,600
I think you need to think about a lot of problems.

125
00:10:39,040 --> 00:10:46,040
I do have a personal theory of how the human mind works.

126
00:10:46,640 --> 00:10:48,460
And I think there are two extremes.

127
00:10:50,700 --> 00:10:51,720
That's how I mentioned that.

128
00:10:53,240 --> 00:10:55,560
One is analytical mind.

129
00:10:58,380 --> 00:11:03,440
And that's someone who is able to investigate individual problems

130
00:11:03,600 --> 00:11:05,700
and go deep into that.

131
00:11:06,640 --> 00:11:13,840
And then there is another extreme, which is like synthesis, right?

132
00:11:13,890 --> 00:11:22,440
Someone who is able to construct a complex problem and like describe, like construct an API and so on.

133
00:11:23,320 --> 00:11:28,140
And I'm very, very clearly in the first group, I think.

134
00:11:28,280 --> 00:11:31,240
I'm much better in like problem solving, investigating problems.

135
00:11:32,140 --> 00:11:38,120
And a lot of the performance problems that I've been working on are exactly that, right?

136
00:11:38,140 --> 00:11:46,240
You need to take a slow query or something that doesn't work well and investigate like why is it happening, right?

137
00:11:47,520 --> 00:11:54,680
So, and I think that's why I've been gravitating to this kind of work.

138
00:11:58,660 --> 00:12:02,920
[CLAIRE] What's it feel like when you solve one of these performance problems?

139
00:12:03,830 --> 00:12:08,240
And maybe help the listeners understand how big are these problems?

140
00:12:08,490 --> 00:12:14,260
When you talk about investigating a performance issue in Postgres and solving it, does it take

141
00:12:14,260 --> 00:12:14,780
you an hour?

142
00:12:15,440 --> 00:12:18,100
Or does it take you a month or more?

143
00:12:22,139 --> 00:12:24,079
[TOMAS] I think it varies a lot. [I thought you were going to say it depends.]

144
00:12:24,600 --> 00:12:37,060
I mean, I think some of the slow queries, right, you will immediately recognize why is it slow and how to fix it.

145
00:12:38,840 --> 00:12:42,640
Because a lot of the queries are slow for the same reason.

146
00:12:43,780 --> 00:12:50,940
But then there are also problems that are complicated and completely novel, right?

147
00:12:51,020 --> 00:12:57,420
It's a problem that no one actually investigated before or maybe not sufficiently.

148
00:12:58,500 --> 00:13:09,040
And that can take, I don't know, a week to even just build the understanding of like do enough experiments to actually understand what is happening.

149
00:13:12,240 --> 00:13:22,000
Then it can take a while to actually fix it or like figure out if there even is a solution because there may not be, right?

150
00:13:25,200 --> 00:13:30,560
So I think each of those problems is like a small puzzle.

151
00:13:31,500 --> 00:13:38,320
So if you like solving puzzles and like investigating like why things work the way they do.

152
00:13:40,800 --> 00:13:44,880
I think performance issues might be a good topic.

153
00:13:47,820 --> 00:13:48,240
[CLAIRE] I'm curious.

154
00:13:48,620 --> 00:13:54,360
Do you do puzzles in your non-database part of your life?

155
00:13:56,060 --> 00:14:01,180
You know, like jigsaw puzzles or crosswords on your on your phone or.

156
00:14:03,100 --> 00:14:20,460
[TOMAS] No, not really. Because it's a bit weird, but I always hated doing homeworks because it was made up problem. You are only doing the homework because someone wants you to make homework.

157
00:14:22,020 --> 00:14:42,860
While the performance issues are like a puzzle that actually are like meaningful like it's something that needs to be solved for I don't know a customer dashboard actually working fast right so that I think makes it different for me.

158
00:14:44,400 --> 00:14:44,520
[CLAIRE] Okay.

159
00:14:46,880 --> 00:14:52,020
So I guess that means you don't read a lot of detective novels

160
00:14:52,200 --> 00:14:56,940
or watch murder mysteries on Netflix or anything like that either.

161
00:14:57,080 --> 00:15:01,820
That love of puzzles doesn't come into that part of your life either.

162
00:15:04,060 --> 00:15:05,000
[TOMAS] I do like

163
00:15:07,280 --> 00:15:08,920
watch movies and so on of course

164
00:15:09,020 --> 00:15:10,960
like that's like a different thing

165
00:15:11,160 --> 00:15:11,940
but like

166
00:15:13,220 --> 00:15:14,940
if I have to make an effort

167
00:15:15,100 --> 00:15:16,840
for something like watching

168
00:15:17,160 --> 00:15:19,160
a movie is like a positive thing

169
00:15:19,300 --> 00:15:20,019
for me right.

170
00:15:20,600 --> 00:15:28,820
But if I have to spend like a day working on something, I would rather that to be like meaningful thing.

171
00:15:29,020 --> 00:15:42,900
[CLAIRE] Okay. Got it. Can you, is there an example that comes to mind that you can walk us through to help

172
00:15:42,910 --> 00:15:48,899
us understand like a recent performance puzzle, if you will, that you've had to solve

173
00:15:49,580 --> 00:15:57,760
in Postgres. I don't know if there's one that is simple enough to make sense on a podcast.

174
00:15:59,280 --> 00:16:13,420
[TOMAS] So I don't know, but a very simple one would be a thing that, like an optimization that I think I did in like 2014 in hash joins, right?

175
00:16:15,900 --> 00:16:20,700
For a long time, Postgres had hash tables with chaining,

176
00:16:21,450 --> 00:16:25,700
which means that each bucket in the hash table

177
00:16:26,320 --> 00:16:30,780
had like 10 tuples in a linked list.

178
00:16:32,200 --> 00:16:37,300
And it turns out actually that maybe that was like okay

179
00:16:38,890 --> 00:16:40,980
in, I don't know, early 2000s.

180
00:16:41,920 --> 00:16:48,920
But at some point, it's better to not have any chaining in the hash table.

181
00:16:49,540 --> 00:16:55,040
So just like we had a problem with a slow hash join.

182
00:16:58,580 --> 00:17:02,620
After investigating that for a while, I don't know, half a day or a day.

183
00:17:03,300 --> 00:17:15,000
I realized that it might be maybe caused by, you know, having a linked list in each bucket of the hash table.

184
00:17:16,060 --> 00:17:26,020
So I rebuilt Postgres with a parameter changed in the hash table code.

185
00:17:27,319 --> 00:17:28,199
And it was fast

186
00:17:28,820 --> 00:17:29,580
right so

187
00:17:30,900 --> 00:17:32,060
like much faster like

188
00:17:32,460 --> 00:17:33,440
twice as fast maybe

189
00:17:36,360 --> 00:17:38,020
after that came

190
00:17:38,280 --> 00:17:39,040
like a patch for

191
00:17:40,180 --> 00:17:40,440
Postgres

192
00:17:42,180 --> 00:17:43,380
changing how the

193
00:17:44,790 --> 00:17:46,260
hash table is built and so on

194
00:17:48,560 --> 00:17:50,100
it was kind of like experiment

195
00:17:50,250 --> 00:17:52,080
but also like a guess,

196
00:17:54,840 --> 00:17:56,040
an informed guess, about what the problem

197
00:17:56,060 --> 00:18:03,160
could be. So it can be even a small change, but this is one of the things that I've been,

198
00:18:03,960 --> 00:18:11,540
and I'm still quite proud of, because even a small change in a code which is used a lot

199
00:18:12,700 --> 00:18:19,220
can be like a significant difference, like if you look at all the installations and so on.

200
00:18:20,900 --> 00:18:25,780
[CLAIRE] You know, one of the interesting things, I've worked at a bunch of different companies in my

201
00:18:25,850 --> 00:18:31,600
career, and there's a word at Microsoft, we don't own the word, there's many, many people throughout

202
00:18:31,630 --> 00:18:37,720
the world that use it, but it's impact. And so when you're doing your performance evaluations at

203
00:18:37,800 --> 00:18:43,560
the end of the year or whatever, the thing that gets discussed is what was the impact of your work?

204
00:18:44,060 --> 00:18:50,960
Not what was your activity, not how much did you get done, but how much did your work change

205
00:18:51,120 --> 00:18:51,280
things?

206
00:18:51,630 --> 00:18:54,940
Whether it's in your case, like changing the performance of Postgres.

207
00:18:56,560 --> 00:19:02,040
And so I think that's one of the things that must be cool about working on performance is

208
00:19:02,180 --> 00:19:04,940
that you can measure the impact.

209
00:19:05,580 --> 00:19:07,200
Well, I hope you can.

210
00:19:07,920 --> 00:19:08,980
Maybe you can't in all cases.

211
00:19:11,420 --> 00:19:11,700
Can you?

212
00:19:13,640 --> 00:19:14,620
[TOMAS] I don't know

213
00:19:15,150 --> 00:19:16,960
because how do you

214
00:19:17,220 --> 00:19:17,400
measure

215
00:19:18,880 --> 00:19:20,180
like the difference from

216
00:19:21,800 --> 00:19:22,640
I don't know

217
00:19:22,860 --> 00:19:23,900
millions of installations.

218
00:19:24,320 --> 00:19:25,660
We don't have any

219
00:19:26,980 --> 00:19:28,320
way to collect the metrics

220
00:19:30,260 --> 00:19:30,840
and also

221
00:19:32,720 --> 00:19:34,480
if you introduce a new feature

222
00:19:35,100 --> 00:19:35,860
for Postgres

223
00:19:36,140 --> 00:19:38,320
I don't think that's any less meaningful

224
00:19:38,940 --> 00:19:40,380
for users that actually

225
00:19:40,400 --> 00:19:41,560
can benefit from that

226
00:19:44,160 --> 00:19:45,600
I think at some point.

227
00:19:45,620 --> 00:19:51,240
I think at some point you just need to believe that what you did actually is useful.

228
00:19:53,240 --> 00:19:57,560
[CLAIRE] You're absolutely right. And I take back what I said. I was thinking about that ability to measure

229
00:19:57,720 --> 00:20:03,800
that something is 2x faster in a certain scenario. And that's nice, but you're right. How many people

230
00:20:04,540 --> 00:20:08,660
and how many applications are actually going to benefit from that? We don't know.

231
00:20:10,660 --> 00:20:14,400
You know, there's certainly no data science group out there that's measuring the number

232
00:20:14,600 --> 00:20:19,320
of Postgres installations and their scenarios and how they use things.

233
00:20:20,720 --> 00:20:20,900
Okay.

234
00:20:21,330 --> 00:20:25,780
So there's a bit of a leap of faith when you work on an open source project like this that

235
00:20:25,920 --> 00:20:32,580
doesn't have all that, all of the metrics built in.

236
00:20:34,940 --> 00:20:40,960
[TOMAS] I think there definitely is, but also a lot of the people actually working on Postgres

237
00:20:41,320 --> 00:20:47,040
are also using Postgres and they are developing stuff that actually matters for them.

238
00:20:48,440 --> 00:20:51,980
So for the first couple of years, I've been actually working at companies

239
00:20:53,110 --> 00:21:01,140
either using Postgres to, say, build business intelligence solutions

240
00:21:02,419 --> 00:21:06,800
or later in 2ndQuadrant, I've been working for a company

241
00:21:06,820 --> 00:21:09,960
in providing services and support, right?

242
00:21:10,300 --> 00:21:15,900
So we had a very direct information from customers.

243
00:21:16,480 --> 00:21:18,780
And we've been often like solving problems

244
00:21:18,920 --> 00:21:20,640
that actually mattered to the customers.

245
00:21:20,750 --> 00:21:23,080
That's what we've been paid for.

246
00:21:25,340 --> 00:21:28,880
So there is some level of like immediate response

247
00:21:29,140 --> 00:21:30,120
or like immediate feedback,

248
00:21:31,400 --> 00:21:34,540
people working on stuff that matters for them personally,

249
00:21:35,340 --> 00:21:37,060
or, yeah.

250
00:21:35,500 --> 00:21:35,780
[CLAIRE] Got it.

251
00:21:38,240 --> 00:21:40,520
Okay, which leads me to a question that I should have asked earlier.

252
00:21:40,790 --> 00:21:47,380
That patch example from 2014 relating to hash joins.

253
00:21:49,240 --> 00:21:50,240
Was that your first patch?

254
00:21:51,300 --> 00:21:51,560
No. Yes?

255
00:21:52,060 --> 00:21:54,620
[TOMAS] No, I think I wrote a couple of patches before.

256
00:21:59,260 --> 00:22:00,840
I don't remember like how many patches

257
00:22:01,140 --> 00:22:05,840
or smaller patches I wrote before that.

258
00:22:04,860 --> 00:22:07,720
[CLAIRE] So you don't remember your first patch that you submitted.

259
00:22:07,780 --> 00:22:10,220
[TOMAS] I think like, I do remember my first patch.

260
00:22:10,260 --> 00:22:12,700
I think, like first contribution,

261
00:22:14,000 --> 00:22:15,700
that was like 2010, I think.

262
00:22:17,300 --> 00:22:21,020
But there was like a very small change in statistics.

263
00:22:22,500 --> 00:22:23,040
I think.

264
00:22:22,520 --> 00:22:32,100
[CLAIRE] Okay. One of the talks that I've given at a few of the recent PGConf.EUs has to do with

265
00:22:33,440 --> 00:22:38,940
looking at as many contributions as we could look at in a particular release. So we did it,

266
00:22:39,360 --> 00:22:44,720
Daniel Gustafsson and I did it for Postgres 17, and then also for Postgres 18. And each of those

267
00:22:44,730 --> 00:22:51,800
has like a 15 month release cycle. But it's very interesting because when you look and compare between

268
00:22:51,820 --> 00:22:58,300
releases and you figure out like how many people made their first ever patch contribution to the

269
00:22:58,590 --> 00:23:05,340
code, code or docs, because it's the same source base, right? In that release, there was a sizable

270
00:23:05,520 --> 00:23:10,840
number of people, but some of those people are probably drive-by contributors. You know, they

271
00:23:11,020 --> 00:23:16,980
probably just needed something, had to scratch an itch, came, contributed, and they're probably never

272
00:23:17,000 --> 00:23:24,760
going to contribute again. But I guess in your case, you kept coming back as, as is the case with

273
00:23:25,210 --> 00:23:31,340
all sorts of other people in the community. Like, did you know when you did that first patch that it

274
00:23:31,340 --> 00:23:35,380
was going to be more than a drive-by contribution, that it was going to be the first of many, many, many?

275
00:23:37,540 --> 00:23:38,120
[TOMAS] No, not really.

276
00:23:38,240 --> 00:23:40,800
And I haven't been even thinking about that.

277
00:23:42,170 --> 00:23:46,720
I didn't have a plan to be a major contributor to Postgres

278
00:23:47,040 --> 00:23:48,180
or even a committer.

279
00:23:48,400 --> 00:23:50,220
I haven't thought about that.

280
00:23:50,960 --> 00:23:55,880
I've been simply hacking on a database, which was cool,

281
00:23:58,740 --> 00:24:04,900
but also I felt like a part of a group of people,

282
00:24:04,950 --> 00:24:05,680
of the community.

283
00:24:08,000 --> 00:24:10,280
That was probably the more important thing

284
00:24:10,620 --> 00:24:15,360
what actually kept me working on Postgres, I think.

285
00:24:17,060 --> 00:24:19,840
The opportunity to work on interesting stuff.

286
00:24:20,360 --> 00:24:21,180
So there was one.

287
00:24:21,840 --> 00:24:27,500
But also being able to talk to the other developers

288
00:24:27,760 --> 00:24:32,520
like very openly, like meet them at conferences.

289
00:24:33,440 --> 00:24:36,100
Because I think like 2009 or something

290
00:24:36,100 --> 00:24:41,540
was the first PGConf.EU conference that I attended.

291
00:24:44,060 --> 00:24:46,200
And I've been going there ever since.

292
00:24:48,560 --> 00:24:51,880
And you could just meet all the other developers

293
00:24:52,280 --> 00:24:55,720
who implemented or wrote some of the code

294
00:24:57,270 --> 00:24:59,760
and talk to them about the different ways

295
00:25:00,120 --> 00:25:03,420
to maybe improve that or problems in the code

296
00:25:03,540 --> 00:25:04,520
and that kind of stuff.

297
00:25:05,480 --> 00:25:25,020
[CLAIRE] So, so you're saying that the community, the vibe, the culture is part of what kind of kept you coming back. It's part of what hooked you on working on this open source project. Is that right? [Yes, absolutely]

298
00:25:25,700 --> 00:25:46,700
Okay. That'll probably make a lot of people feel good, right? Because there are so many people who invest their time, not just into the code, but also into the conferences and into the culture and the way people treat each other or the way they try to make sure that new people feel welcome.

299
00:25:47,010 --> 00:25:51,620
And so that's kind of a nice compliment, what you just said.

300
00:25:54,740 --> 00:25:56,820
[TOMAS] Yeah and it's also true right so.

301
00:25:57,040 --> 00:26:05,280
So, and it's also one of the reasons why I'm still like helping to organize the Prague conference.

302
00:26:06,000 --> 00:26:06,120
Right.

303
00:26:06,740 --> 00:26:10,460
So I think there's like the 15th year that I've been helping to organize.

304
00:26:13,460 --> 00:26:17,220
[CLAIRE] Yeah so I'll make sure to drop a link to P2D2 the

305
00:26:18,220 --> 00:26:19,900
Prague Postgres developer day,

306
00:26:21,570 --> 00:26:22,660
into the show notes

307
00:26:23,630 --> 00:26:25,480
for the episode so that people who are

308
00:26:25,640 --> 00:26:28,920
interested in going to Prague can check it out for next year.

309
00:26:29,100 --> 00:26:32,360
It happens, well, because of COVID, I think it moved

310
00:26:32,540 --> 00:26:35,620
around in those early 2020 years.

311
00:26:35,980 --> 00:26:38,700
But most recently, it's been in January,

312
00:26:39,180 --> 00:26:41,940
right before FOSDEM. And it just happened a couple weeks ago, right?

313
00:26:41,060 --> 00:26:41,140
[TOMAS] Right.

314
00:26:43,110 --> 00:26:43,340
Exactly.

315
00:26:44,700 --> 00:26:47,060
And it depends

316
00:26:47,180 --> 00:26:48,180
on mostly on

317
00:26:49,899 --> 00:26:50,820
availability

318
00:26:50,840 --> 00:26:58,360
of the venue from the university so it needs to be in roughly like this time of the year

319
00:26:59,140 --> 00:27:03,520
because that's when the university has an exams period right so

320
00:27:08,460 --> 00:27:12,900
yeah exactly without students being there for exams or whatever.

321
00:27:14,040 --> 00:27:25,000
[CLAIRE] Yeah, I'm looking at some of the pictures on the website right now, and it looks like the presentation rooms are those kind of classrooms where the chairs go up, right? [Yes]

322
00:27:25,360 --> 00:27:33,700
So people in the back row are much, much higher than the speaker is down below, and everybody gets a good view, and the room looks pretty full, too.

323
00:27:34,340 --> 00:27:35,740
[TOMAS] Yes

324
00:27:36,000 --> 00:27:39,680
[CLAIRE] You've got how many people come to this thing?

325
00:27:41,280 --> 00:27:42,760
[TOMAS] So I think this year we had like

326
00:27:43,600 --> 00:27:44,860
300, 320

327
00:27:45,240 --> 00:27:45,400
people

328
00:27:46,900 --> 00:27:47,000
but

329
00:27:48,480 --> 00:27:49,820
we now have like three

330
00:27:50,700 --> 00:27:51,920
tracks so

331
00:27:53,179 --> 00:27:54,620
the rooms are not like

332
00:27:54,980 --> 00:27:56,260
too full I think.

333
00:27:57,239 --> 00:28:02,900
[CLAIRE] Okay and then people who attend have to choose like that's that's the hard thing about multi-track

334
00:28:03,200 --> 00:28:08,660
conferences is that like oftentimes it in that it's like Murphy's Law like there are always two.

335
00:28:08,700 --> 00:28:30,240
Things at the same time that I want to go to. But it's just life. Nice.

336
00:28:13,060 --> 00:28:13,440
[TOMAS] Yeah,

337
00:28:15,430 --> 00:28:17,600
I don't think we will go for like a fourth track

338
00:28:18,190 --> 00:28:20,620
I mean like we could but that wouldn't actually work

339
00:28:21,460 --> 00:28:23,700
and the good thing is that of course like the

340
00:28:24,580 --> 00:28:27,740
talks are recorded because the university has like a

341
00:28:28,100 --> 00:28:31,160
AV system built in so we can actually

342
00:28:31,810 --> 00:28:32,420
publish that.

343
00:28:33,600 --> 00:28:38,640
[CLAIRE] That's nice because I know the recording of talks is one of the most expensive parts of putting

344
00:28:39,040 --> 00:28:44,000
these Postgres conferences in some cities. I mean, obviously the costs vary city by city.

345
00:28:46,080 --> 00:28:49,680
One of the things I really liked, and I wasn't there. I came to FOSDEM this year,

346
00:28:49,680 --> 00:28:59,640
but I didn't make it to P2D2 first. But you and Nazir Bilal Yavuz, who's another Postgres contributor,

347
00:29:00,720 --> 00:29:06,460
you co-taught a full day workshop called Introduction to Postgres Hacking. And just.

348
00:29:07,340 --> 00:29:15,940
I don't know. I'm not mentioning it as an advertisement, but as a thank you. Because I

349
00:29:16,120 --> 00:29:23,460
know that putting together a full day workshop is a lot of work. And I just love the fact that,

350
00:29:24,020 --> 00:29:28,480
and you're not the only one to do these intro to hacking workshops on the planet. There are other

351
00:29:28,680 --> 00:29:35,280
people who've done them as well. But I think it's so valuable to help tomorrow's future contributors

352
00:29:35,300 --> 00:29:38,660
and committers to, you know, get their feet wet.

353
00:29:41,660 --> 00:29:48,500
[TOMAS] Yeah, and it's definitely a lot of work to prepare that, and also a bit exhausting to

354
00:29:48,530 --> 00:29:51,340
actually do a whole day of like a workshop.

355
00:29:53,460 --> 00:29:57,740
But I think it's worth it because I think it ties together

356
00:29:59,220 --> 00:30:03,900
to the effort actually building a community, right?

357
00:30:04,080 --> 00:30:07,800
Being actually able to help other developers

358
00:30:08,620 --> 00:30:12,080
and contributors to actually start hacking on Postgres.

359
00:30:14,419 --> 00:30:17,180
So in a way, I'm just trying to give back

360
00:30:17,920 --> 00:30:22,980
what I got from the community, I don't know, 20 years ago.

361
00:30:23,060 --> 00:30:23,500
All right.

362
00:30:25,820 --> 00:30:30,680
[CLAIRE] I'm looking at the abstract for the intro to Postgres hacking workshop right now, and

363
00:30:30,680 --> 00:30:33,140
I love the beginning of the last paragraph.

364
00:30:33,720 --> 00:30:38,960
We expect basic knowledge of C, but we see the letter C, the programming language C.

365
00:30:39,690 --> 00:30:43,360
But we don't expect you to know the strange and unique C stuff in Postgres.

366
00:30:45,120 --> 00:30:45,420
I don't know.

367
00:30:45,550 --> 00:30:46,440
To me, that's a hook.

368
00:30:46,640 --> 00:30:49,640
Oh, what's the strange and unique C stuff in Postgres?

369
00:30:51,660 --> 00:30:54,100
And then you say, that's what the workshop is meant to address.

370
00:30:55,860 --> 00:30:56,180
Very cool. [Yeah]

371
00:30:58,960 --> 00:31:08,500
Okay, so we're supposed to be talking about performance and why it's so fun to hack on Postgres performance.

372
00:31:08,590 --> 00:31:10,180
So we're going to circle back to that in a minute.

373
00:31:10,530 --> 00:31:21,400
But before we do, speaking of you giving back and doing things in the community to help tomorrow's future contributors and committers,

374
00:31:21,840 --> 00:31:27,680
one of the other things that I think you started doing sometime in the last year that I think is pretty cool are office hours.

375
00:31:28,940 --> 00:31:34,000
Which like the concept has been around forever. Those of us who went to university probably went. [Yes]

376
00:31:34,120 --> 00:31:40,200
To office hours held by our teaching assistants or professors or whatever. Like, okay, so it's not

377
00:31:40,200 --> 00:31:48,260
our new concept, but your office hours are like open to the public, right? Anybody who's trying to

378
00:31:48,550 --> 00:31:53,120
hack on Postgres? How does it work? Tell me.

379
00:31:53,480 --> 00:32:10,420
[TOMAS] Right. So why I started doing office hours is that sometimes, like for new contributors, especially, it can be intimidating to actually post the first patch to the mailing list.

380
00:32:10,500 --> 00:32:16,100
Right and maybe they don't even like understand how the mailing list like works or like how

381
00:32:16,420 --> 00:32:25,900
how to do that properly and that kind of stuff or sometimes people can feel like not sure if

382
00:32:26,280 --> 00:32:33,540
if the patch they are working on or like is it even a good idea or is it something that like

383
00:32:33,760 --> 00:32:40,940
we would want actually to in Postgres and so on or how to prepare the patch properly and so on

384
00:32:41,600 --> 00:32:48,620
and then they don't actually end up sending the patch at all right like because

385
00:32:49,970 --> 00:32:58,040
just like give up on that and sometimes the people like the new contributors do not actually

386
00:32:58,820 --> 00:33:05,480
realize that they can actually write an email to the individual contributors right you don't need to

387
00:33:06,360 --> 00:33:13,540
communicate with just the mailing list you can just write to the actually contributors and like

388
00:33:13,980 --> 00:33:21,160
ask for help or like ask for opinions and so on like the worst thing that could happen is that

389
00:33:22,680 --> 00:33:28,320
they will just not respond right because they have like too much work or something but

390
00:33:31,100 --> 00:33:34,940
I wanted to make this like explicit right like to make it sure like

391
00:33:36,620 --> 00:33:42,020
you can you can kind of like advertise that you can actually reach out to me and

392
00:33:43,260 --> 00:33:49,419
maybe I know nothing about the patch that you are working on I will not have an opinion but I could

393
00:33:49,440 --> 00:33:51,300
still give you advice how to

394
00:33:53,970 --> 00:33:55,340
how to submit a patch to

395
00:33:55,340 --> 00:33:56,680
the mailing list or something

396
00:33:57,779 --> 00:33:59,560
or suggest who to talk

397
00:33:59,680 --> 00:34:01,040
to or something like that

398
00:34:01,600 --> 00:34:03,120
or give you maybe a little bit of

399
00:34:03,300 --> 00:34:04,380
encouragement to

400
00:34:06,020 --> 00:34:06,440
do stuff

401
00:34:10,100 --> 00:34:11,500
that's what my office hours

402
00:34:11,919 --> 00:34:12,120
are

403
00:34:14,320 --> 00:34:15,280
I do that

404
00:34:15,480 --> 00:34:17,679
I do have a slot like every week

405
00:34:17,980 --> 00:34:19,399
people can just

406
00:34:19,419 --> 00:34:29,940
send me an email and like ask for a bit of time I expect people to tell me like what they want

407
00:34:30,020 --> 00:34:35,440
to talk about so that I can prepare like to keep the you know half an hour or something

408
00:34:36,440 --> 00:34:46,279
to keep it productive but beyond that it's up to up to you what you want to talk about like

409
00:34:46,860 --> 00:34:53,940
it should be obviously like a thing about like Postgres coding or like Postgres stuff but it

410
00:34:54,020 --> 00:35:03,960
can be like you can you can suggest like a completely new patch idea or maybe you already

411
00:35:03,960 --> 00:35:11,180
have a patch and you are just asking you're wondering how to best submit that to the mailing

412
00:35:11,220 --> 00:35:19,500
list like what should you be careful about like if you are forgetting about some sort of test

413
00:35:19,720 --> 00:35:27,260
or I don't know or maybe you already submitted a patch and it's kind of like stuck right and

414
00:35:28,100 --> 00:35:37,020
that's quite common issue for any developer because there's just so much work on being done

415
00:35:37,060 --> 00:35:44,880
on the mailing list so maybe I can give you advice like how to try to unstuck that I don't have

416
00:35:45,300 --> 00:35:52,840
like perfect solutions but I will at least give you an advice what I would do or what I would

417
00:35:53,600 --> 00:36:00,680
try doing so it's more just like an offer to have a chat right.

418
00:36:01,380 --> 00:36:03,560
[CLAIRE] I think it's great that you're doing it.

419
00:36:03,560 --> 00:36:07,460
And I know you're not the only Postgres committer who has office hours.

420
00:36:08,940 --> 00:36:13,440
I am hoping that somewhere, I don't know where the right place is,

421
00:36:14,210 --> 00:36:17,720
whether there should be a channel on the Postgres hacking Discord

422
00:36:18,310 --> 00:36:21,760
or a page on the Postgres wiki that's Google searchable or whatever,

423
00:36:22,460 --> 00:36:27,380
where those of you who do have office hours kind of have them all listed in one place.

424
00:36:28,100 --> 00:36:34,940
So somebody, a new contributor who's looking for it is more likely to know about it or discover it.

425
00:36:35,120 --> 00:36:39,680
But I did, I will drop a link to your blogs about page,

426
00:36:40,260 --> 00:36:44,620
which has the information about like how to sign up for your office hours.

427
00:36:44,780 --> 00:36:47,180
So I'll include that in the show notes for this episode too.

428
00:36:48,320 --> 00:36:51,260
[TOMAS] Yeah thank you I think the office hours are

429
00:36:52,980 --> 00:37:00,680
kind of like an extension of hallway track in at conferences right because I and that I think

430
00:37:00,740 --> 00:37:09,860
a very important part of a conference which is not transferable to talk recordings right and also

431
00:37:10,920 --> 00:37:20,580
that's how and where people discuss patches in face to face right so

432
00:37:22,460 --> 00:37:25,880
I'm trying to do something similar with like office hours, yes.

433
00:37:26,700 --> 00:37:33,320
[CLAIRE] Speaking about office hours at conferences, excuse me, speaking about hallway tracks at conferences,

434
00:37:33,960 --> 00:37:39,240
like, I never understood what that meant when I was earlier in my career. You know, I remember going

435
00:37:39,440 --> 00:37:43,540
to like USENIX conferences and people were talking about the hallway track. I'm like, what do they

436
00:37:43,560 --> 00:37:49,400
mean? What is that? Like, I just didn't get it. And I think it's because I just didn't know enough

437
00:37:49,600 --> 00:37:54,940
people yet. I didn't know how to comport myself, how to just walk up to strangers and introduce

438
00:37:55,180 --> 00:38:02,900
myself and ask questions. Like, I was too intimidated, I think. I'm not anymore. But I was

439
00:38:03,060 --> 00:38:08,900
then. So anyway, it is true that some of these Postgres conferences have the most amazing,

440
00:38:09,520 --> 00:38:16,000
welcoming, interesting hallway tracks. And you can learn as much in the hallway as you do by

441
00:38:16,220 --> 00:38:21,820
attending talks. And there's one in particular that's coming up in May, which if anyone is

442
00:38:21,980 --> 00:38:26,360
listening and they are thinking about becoming a contributor, or maybe they're in early days

443
00:38:26,920 --> 00:38:33,880
to Postgres, it's called PGConf.dev. It happens annually in Canada. This year, it's going to be

444
00:38:33,940 --> 00:38:41,740
in Vancouver, Canada in May, May 19 through 22. And I'll drop a link to it in the show notes as

445
00:38:41,920 --> 00:38:47,420
well, just in case people want to go check it out. And I know they have really good pricing for...

446
00:38:47,930 --> 00:38:52,260
I think particularly they probably have student pricing, but the pricing to attend the conference

447
00:38:52,540 --> 00:39:01,820
is deliberately low to make it easy for people to be part of it. And let's see, it's two days.

448
00:39:02,440 --> 00:39:08,700
Okay, there's a pre-conference day, which is actually chock full of amazing sessions this year on Tuesday.

449
00:39:09,840 --> 00:39:11,660
So I would say it's not optional.

450
00:39:11,740 --> 00:39:12,680
You should be there for Tuesday.

451
00:39:13,140 --> 00:39:16,700
Wednesday and Thursday are normal conference days and Fridays and unconference.

452
00:39:17,620 --> 00:39:18,760
So basically four days.

453
00:39:19,900 --> 00:39:22,600
And you are going, right, Tomas?

454
00:39:22,860 --> 00:39:24,420
[TOMAS] Yeah, I definitely plan to go.

455
00:39:24,860 --> 00:39:25,020
Yes.

456
00:39:26,460 --> 00:39:28,880
And I kind of have to because I have a talk there.

457
00:39:31,580 --> 00:39:53,020
[CLAIRE] Yes, I plan to go as well. I have a panel and a talk. And it's just so much fun. Particularly if you are, if your idea of fun is hanging out with people who enjoy working on databases. It's just, it's, I don't know, it's one of my favorites.

458
00:39:53,600 --> 00:40:11,920
Okay. We talked about office hours and we talked about the intro to hacking workshop and organizing a P2D2 and other ways that you're focused on giving back to the community and helping train like the next generation of contributors.

459
00:40:13,300 --> 00:40:20,860
Before we flip back to performance, is there anything else we should talk about in terms of your community work that's interesting?

460
00:40:23,180 --> 00:40:24,420
[TOMAS] I don't know.

461
00:40:24,820 --> 00:40:26,560
I think maybe we should just like mention

462
00:40:27,060 --> 00:40:28,940
what actually hallway track means.

463
00:40:29,560 --> 00:40:31,340
Because we've been just talking about like. [Okay, let's do it]

464
00:40:32,960 --> 00:40:33,620
I think

465
00:40:33,740 --> 00:40:35,300
hallway track is the time

466
00:40:36,400 --> 00:40:37,380
between talks

467
00:40:37,780 --> 00:40:39,540
and after the conference essentially

468
00:40:40,080 --> 00:40:41,380
like when you can talk

469
00:40:41,470 --> 00:40:43,180
to the other engineers and like

470
00:40:43,520 --> 00:40:45,020
maybe have a tea or

471
00:40:45,430 --> 00:40:47,140
coffee or beer or whatever

472
00:40:47,940 --> 00:40:49,480
and discuss the

473
00:40:49,640 --> 00:40:51,160
stuff that was maybe

474
00:40:52,680 --> 00:40:53,380
mentioned in the

475
00:40:53,560 --> 00:40:55,260
talks or like any other

476
00:40:55,560 --> 00:40:57,280
problem or any other

477
00:40:57,300 --> 00:41:06,420
patch feature that you are like working on and the amazing part of the Postgres community is that

478
00:41:07,660 --> 00:41:13,420
it's not like a single company right like we are working for different companies for different

479
00:41:13,780 --> 00:41:20,340
customers different fields and we still collaborate a lot on all these things so.

480
00:41:21,280 --> 00:41:28,520
So I think the hallway track is the most important part of the conferences, at least for me.

481
00:41:32,080 --> 00:41:39,640
And I think it's important to actually introduce the new people coming to the community to actually to this concept, to how important that is.

482
00:41:40,920 --> 00:41:43,120
And also kind of like encourage that.

483
00:41:43,500 --> 00:41:53,480
And I think, for example, last year at PGConf.dev, I think it was a great idea that there was like organized dinner, right?

484
00:41:54,140 --> 00:42:08,080
Where, yes, I mean, instead of, because what often happens at conferences like this is that you see the people that you know for, I don't know, 20 years, right?

485
00:41:54,520 --> 00:41:55,540
[CLAIRE] Oh, the meet and eats.

486
00:42:08,120 --> 00:42:19,180
[TOMAS] So you go to the dinner with them and the new attendees are kind of like stuck and like left to do their own thing.

487
00:42:19,740 --> 00:42:24,860
[CLAIRE] Yeah, they don't have an invitation to the dinners that exist because they don't know the people yet.

488
00:42:21,299 --> 00:42:26,480
[TOMAS] And yeah, right, exactly.

489
00:42:27,660 --> 00:42:37,240
And I think last year there was like a random selection or random grouping, right, of people for the dinner.

490
00:42:38,140 --> 00:42:41,620
So you actually got to meet anyone

491
00:42:44,380 --> 00:42:46,560
randomly and have a chat with them

492
00:42:46,930 --> 00:42:47,840
I think it was interesting

493
00:42:47,930 --> 00:42:50,220
I was a bit jet lagged

494
00:42:50,340 --> 00:42:52,480
but other than that I think it was wonderful.

495
00:42:54,160 --> 00:42:58,940
[CLAIRE] Yeah, and I know that at the meet and eat dinner I went to, people just went around the table

496
00:42:59,240 --> 00:43:00,060
and introduced themselves.

497
00:43:03,780 --> 00:43:09,660
There's also a fun run for people who are runners who want to go for a run one evening.

498
00:43:10,120 --> 00:43:13,420
This year, like I said, the Tuesday content is being completely revamped.

499
00:43:15,100 --> 00:43:26,140
And it hasn't been announced yet, but the intention is to make one of the tracks on Tuesday super interesting to newcomers, as well as people who have been involved forever.

500
00:43:26,920 --> 00:43:30,200
So, and I think the team is going to be successful with that.

501
00:43:32,380 --> 00:43:41,440
Okay, well, there is one other thing I want to also talk about in the community area, which is the Postgres Hacking Discord, which is somewhat new.

502
00:43:41,550 --> 00:43:43,520
We should include a link in the show notes as well.

503
00:43:44,320 --> 00:43:48,800
It's only been around for, oh my gosh, it's almost two years, but it's more than a year

504
00:43:48,860 --> 00:43:49,360
and a half now.

505
00:43:50,920 --> 00:43:56,320
Started in summer of 2024, right after PGConf.dev.

506
00:43:57,080 --> 00:44:02,020
And anyway, I think there is a monthly hacking workshop that happens there that you have been

507
00:44:02,160 --> 00:44:02,840
part of on occasion.

508
00:44:03,010 --> 00:44:03,500
Is that right?

509
00:44:05,140 --> 00:44:10,940
[TOMAS] I've been and actually I will be on the workshop next month I think again.

510
00:44:10,600 --> 00:44:10,880
[CLAIRE] Oh, cool.

511
00:44:12,040 --> 00:44:12,660
So what is this?

512
00:44:13,110 --> 00:44:14,000
Tell us, please.

513
00:44:15,940 --> 00:44:22,620
[TOMAS] So the hacking workshop is a monthly session organized by Robert Haas

514
00:44:26,360 --> 00:44:33,200
where the attendees pick a recorded talk from one of the past conferences

515
00:44:34,070 --> 00:44:39,940
then they watch the talk on their own and then there is a session

516
00:44:41,700 --> 00:44:50,540
like an hour where everyone joins a Zoom I think or like video conference and they can

517
00:44:50,660 --> 00:44:59,060
ask questions to the to the speaker of from the talk recording right so they can ask

518
00:44:59,060 --> 00:45:05,360
like discuss the thing like they are not watching the recording together but they can actually

519
00:45:05,380 --> 00:45:12,540
discuss what was in the talk and ask questions and that kind of stuff so.

520
00:45:12,240 --> 00:45:13,600
[CLAIRE] So it's like peeling the onion.

521
00:45:14,660 --> 00:45:20,040
For a whole hour, which is a lot more than the normal like five, ten minutes of Q&A at the end of a talk.

522
00:45:16,400 --> 00:45:16,900
[TOMAS] Yeah

523
00:45:20,900 --> 00:45:25,300
yes exactly and like sometimes the discussion kind of like

524
00:45:25,680 --> 00:45:33,460
you know gets distracted and like starts talking about stuff that wasn't in the talk but

525
00:45:33,480 --> 00:45:39,760
is kind of like related to that and that's fine, I think it's like online hallway track

526
00:45:40,040 --> 00:45:48,060
in a way like being able to discuss technical stuff right kind of like a brainstorming or so on.

527
00:45:50,320 --> 00:45:53,560
[CLAIRE] I was up in Redmond last week.

528
00:45:53,650 --> 00:45:55,780
I don't, even though I work for Microsoft.

529
00:45:55,910 --> 00:45:58,100
I don't actually go up to Seattle very often.

530
00:45:58,500 --> 00:45:59,960
Maybe like twice a year at the most.

531
00:46:01,880 --> 00:46:02,660
And while I was there.

532
00:46:02,660 --> 00:46:05,780
I gave a talk at the Seattle Postgres meetup.

533
00:46:06,920 --> 00:46:10,200
And I think it was the talk about contributions

534
00:46:10,680 --> 00:46:12,040
to Postgres during PG 18.

535
00:46:12,940 --> 00:46:17,240
And at the end, and so of course I mentioned the Discord and the monthly hacking workshops

536
00:46:17,610 --> 00:46:22,640
and how that Discord initially started as like just the Discord for the mentoring program

537
00:46:23,090 --> 00:46:26,180
and then expanded its scope and its activity.

538
00:46:27,000 --> 00:46:31,900
And anyway, there was a developer who walked up to me at the end and said, oh my gosh, that

539
00:46:31,960 --> 00:46:33,040
was such a smart move.

540
00:46:33,580 --> 00:46:37,100
He said to create a Discord for the Postgres hackers.

541
00:46:37,420 --> 00:46:46,340
He said, if you want this generation, and we're all on Discord all the time, to kind of be more participative.

542
00:46:46,480 --> 00:46:47,000
Is that a word?

543
00:46:48,120 --> 00:46:52,440
He said that was a good call and just wanted to give it a plus one.

544
00:46:52,640 --> 00:46:55,780
So I've been meaning to tell Robert Haas that, and I haven't had a chance yet.

545
00:46:55,860 --> 00:46:57,380
So maybe he'll listen to this episode.

546
00:46:58,000 --> 00:46:59,540
He'll discover the compliment that way.

547
00:47:00,740 --> 00:47:07,080
[TOMAS] Yeah I personally attended a number of those you know sessions and I it was always like

548
00:47:07,300 --> 00:47:14,060
interesting thing it's it would be better to meet face to face and like have a chat

549
00:47:14,380 --> 00:47:18,540
about the problem, about that personally, right?

550
00:47:19,240 --> 00:47:21,680
But this is the next best thing.

551
00:47:21,780 --> 00:47:25,580
And for many people who do not go to Postgres conferences

552
00:47:25,860 --> 00:47:29,240
for whatever reason or can't go,

553
00:47:30,820 --> 00:47:32,920
it's the only option possible, right?

554
00:47:33,140 --> 00:47:39,120
So I think I agree that it was a good thing

555
00:47:39,740 --> 00:47:42,680
that Robert, you know, came up with this idea

556
00:47:42,700 --> 00:47:44,360
and started actually organizing that.

557
00:47:44,810 --> 00:47:47,800
And it definitely is a fair amount of work

558
00:47:47,800 --> 00:47:48,920
to actually keep it running.

559
00:47:50,340 --> 00:47:52,220
So, yeah.

560
00:47:53,960 --> 00:47:58,900
[CLAIRE] Okay, so you said that one of your talks is going to be the focus of an upcoming

561
00:48:00,760 --> 00:48:04,700
monthly hacking workshop. Is it one of your performance related talks or,

562
00:48:05,170 --> 00:48:07,660
huh, are all your talks performance related?

563
00:48:12,420 --> 00:48:18,300
[TOMAS] So the next month is going to be, I think, the talk from...

564
00:48:21,520 --> 00:48:22,600
What was that?

565
00:48:25,260 --> 00:48:36,800
PGConf.EU 2024, I think, which was about the performance archaeology or something, I think.

566
00:48:38,360 --> 00:48:42,040
About like how the performance evolved over the years.

567
00:48:42,750 --> 00:48:43,440
Let me check.

568
00:48:43,590 --> 00:48:44,800
I know that I'm...

569
00:48:45,660 --> 00:48:50,080
[CLAIRE] I did a little research right before we started today's podcast. [Okay]

570
00:48:50,690 --> 00:48:56,100
And just here are my proof points for why I think so many of your talks are about performance.

571
00:48:57,580 --> 00:49:01,940
Next month in March at Nordic PG Day, which is happening in Helsinki this time,

572
00:49:02,380 --> 00:49:08,120
you're giving a talk on efficiently approximating and estimating percentiles and histograms,

573
00:49:09,120 --> 00:49:15,520
which you also gave that talk in FOSDEM PGDay, a couple of weeks ago, right? [Yes] Okay, and then for

574
00:49:15,660 --> 00:49:23,520
POSETTE 2026, and I have insider information here, you're going to be giving a talk about random page

575
00:49:23,700 --> 00:49:29,480
cost in Postgres, like why the default is 4.0 and whether you should lower it for SSDs and things

576
00:49:29,500 --> 00:49:34,320
like that. Last year at POSETTE, you gave a talk, the performance archaeology talk,

577
00:49:35,060 --> 00:49:41,140
20 years of improvements. And then PGConf.EU, you last fall, you talked about fast path locking. [Right]

578
00:49:42,839 --> 00:49:47,120
So I don't know. Do you ever give a talk about anything besides performance?

579
00:49:49,100 --> 00:49:58,520
[TOMAS] So I would say that actually the percentage, the estimated percentage is not really performance-related,

580
00:49:59,100 --> 00:50:10,500
or at least not about Postgres improvements itself, because this is about extensions and about a concept of sketches

581
00:50:10,760 --> 00:50:15,980
that is from streaming databases and so on.

582
00:50:16,020 --> 00:50:23,460
So, yes.

583
00:50:16,580 --> 00:50:16,660
[CLAIRE] Okay.

584
00:50:16,700 --> 00:50:20,980
But I want to debate you on that because the whole reason people use approximation algorithms

585
00:50:21,540 --> 00:50:24,120
is because they want to get to an answer faster.

586
00:50:25,340 --> 00:50:25,420
[TOMAS] Right.

587
00:50:26,060 --> 00:50:33,920
So, yeah, I do work in the field of performance stuff.

588
00:50:34,800 --> 00:50:36,980
So, my talks are about performance.

589
00:50:37,460 --> 00:50:39,280
So, that's definitely true.

590
00:50:39,640 --> 00:50:39,840
Yes.

591
00:50:43,620 --> 00:50:49,840
The percentile talk is a bit different in that it's not really about Postgres feature, right?

592
00:50:50,280 --> 00:50:51,740
[CLAIRE] It's not about the core, you mean?

593
00:50:50,440 --> 00:50:54,000
[TOMAS] It's about, yes, it's not about the Postgres core.

594
00:50:54,460 --> 00:50:55,500
[CLAIRE] Okay, I'll give you that.

595
00:50:54,540 --> 00:51:01,020
[TOMAS] It's about like research papers implemented using Postgres, but it's like a different thing.

596
00:51:00,140 --> 00:51:05,840
[CLAIRE] But also, you're the maintainer of the tea digest extension for Postgres.

597
00:51:05,940 --> 00:51:06,400
Am I right?

598
00:51:07,050 --> 00:51:07,220
[TOMAS] Yes.

599
00:51:09,460 --> 00:51:22,540
So, I mean, it's my hobby to read research papers and like, do stuff using Postgres, which is amazing that we can actually use the extensibility of Postgres to do this kind of stuff like very easily.

600
00:51:10,660 --> 00:51:11,620
[CLAIRE] So I...

601
00:51:24,440 --> 00:51:27,920
Okay, so I don't want to elevate you on a pedestal and put myself down.

602
00:51:28,160 --> 00:51:30,400
But it's my hobby to read detective novels.

603
00:51:30,960 --> 00:51:33,060
And it's your hobby to read research papers

604
00:51:34,100 --> 00:51:36,800
on mathematical concepts that speed things up.

605
00:51:37,040 --> 00:51:39,660
So that's one way in which you and I are different, I guess.

606
00:51:40,240 --> 00:51:43,560
Oh, speaking of you, really quick side note.

607
00:51:43,840 --> 00:51:46,480
Can someone in the text chat tell us who won that hockey game?

608
00:51:47,020 --> 00:51:48,200
It must be over by now.

609
00:51:48,740 --> 00:51:49,880
[TOMAS] I think it was Canada.

610
00:51:50,960 --> 00:51:51,480
[CLAIRE] Oh, really?

611
00:51:51,940 --> 00:51:52,400
I'm sorry.

612
00:51:54,640 --> 00:51:58,660
Okay. Well, I'm happy for Aaron and I'm sad for you, Tomas. [Happens]

613
00:52:03,020 --> 00:52:09,920
Yeah. Okay. All right. So let's flip back to performance for a second.

614
00:52:11,660 --> 00:52:19,420
I don't know if you can articulate this. I don't actually know if... Sometimes it's very hard for

615
00:52:20,020 --> 00:52:25,980
me to describe my emotions, but I'm trying to imagine like if there's a special feeling

616
00:52:26,650 --> 00:52:33,520
that you associate with solving a performance problem, like getting to the answer, realizing

617
00:52:33,720 --> 00:52:35,420
it's going to speed things up a lot.

618
00:52:36,480 --> 00:52:37,440
I don't know.

619
00:52:38,310 --> 00:52:38,980
How does that feel?

620
00:52:41,260 --> 00:52:48,020
[TOMAS] I don't remember like any explicit emotions but it's definitely like a good feeling that

621
00:52:48,080 --> 00:52:56,520
you solve the problem right so there is definitely some like amount of like satisfaction or like

622
00:52:56,690 --> 00:53:04,420
feeling of satisfaction and achievement and I think like everyone needs a little bit of that

623
00:53:04,940 --> 00:53:11,320
when working that you actually achieve like feeling of like achievement something

624
00:53:14,340 --> 00:53:16,040
so I definitely do have that

625
00:53:16,910 --> 00:53:17,680
sometimes it's like

626
00:53:18,450 --> 00:53:20,100
it takes a long time actually to

627
00:53:20,940 --> 00:53:21,460
get it done

628
00:53:24,560 --> 00:53:25,860
right now I'm working on the

629
00:53:26,540 --> 00:53:27,320
index prefetching

630
00:53:28,160 --> 00:53:29,120
with Peter Geoghegan

631
00:53:30,340 --> 00:53:32,080
and that's a patch that I

632
00:53:32,320 --> 00:53:33,920
think I started working on

633
00:53:33,950 --> 00:53:34,820
like three years ago

634
00:53:36,200 --> 00:53:37,820
and it's still not committed

635
00:53:38,140 --> 00:53:39,759
right so hopefully

636
00:53:39,760 --> 00:53:45,560
for like Postgres 19 we will have something, at least part of that committed.

637
00:53:47,820 --> 00:53:53,280
[CLAIRE] When you say index prefetching, I know that means something to some of our listeners, but

638
00:53:53,560 --> 00:53:57,120
there's other listeners who will be like, so what?

639
00:53:58,160 --> 00:53:59,960
Who's that going to help and in what scenario?

640
00:54:01,620 --> 00:54:08,660
[TOMAS] So it will help anyone who is using index scans or like indexes in Postgres

641
00:54:09,390 --> 00:54:19,740
because indexes are index scans are the main source of random I/O which especially on like

642
00:54:21,040 --> 00:54:31,580
regular storage to get good performance you need to actually issue the I/O request like

643
00:54:31,860 --> 00:54:38,380
early enough so that you don't have to wait for the actual data once you get to actually

644
00:54:39,300 --> 00:54:39,720
need that.

645
00:54:40,620 --> 00:54:51,040
So it's part of the asynchronous I/O work which is done by like Andres and Thomas Munro and

646
00:54:51,180 --> 00:54:59,000
Melanie and various other people both in the Microsoft team and in the Postgres community

647
00:54:59,020 --> 00:54:59,400
in general.

648
00:55:02,360 --> 00:55:08,360
And it's a way to better utilize the storage, right?

649
00:55:09,120 --> 00:55:12,500
Which for database is an important thing.

650
00:55:15,440 --> 00:55:15,540
[CLAIRE] Okay.

651
00:55:17,019 --> 00:55:22,780
So maybe Postgres 19, which the last commit fest is next month, right?

652
00:55:23,600 --> 00:55:23,920
[TOMAS] Right.

653
00:55:24,210 --> 00:55:27,300
I mean, commit fests are, yes.

654
00:55:25,200 --> 00:55:25,900
[CLAIRE] For feature freeze.

655
00:55:27,560 --> 00:55:30,300
[TOMAS] I mean, the feature freeze is like early April.

656
00:55:31,080 --> 00:55:32,400
First week of April.

657
00:55:31,300 --> 00:55:31,520
[CLAIRE] Yeah.

658
00:55:31,720 --> 00:55:33,000
I don't mean the last one ever.

659
00:55:33,120 --> 00:55:38,700
I mean the final commit fest prior to feature freeze for Postgres 19, specifically.

660
00:55:38,760 --> 00:55:42,320
[TOMAS] Yeah, I mean, commit tests are a bit obsolete at this point,

661
00:55:42,780 --> 00:55:43,580
but we still use them.

662
00:55:43,940 --> 00:55:50,640
But yes, next month is like when we need to actually get it into Postgres,

663
00:55:51,120 --> 00:55:54,480
into the Git repository.

664
00:55:55,780 --> 00:56:01,840
[CLAIRE] So one of the things that I think might not be obvious to someone who's just starting out,

665
00:56:01,980 --> 00:56:06,300
like if someone isn't sure they would be good at performance, they might conclude that they're

666
00:56:06,380 --> 00:56:11,500
not going to be good because they see a performance problem and they have no idea how to solve it

667
00:56:11,860 --> 00:56:12,340
in the beginning.

668
00:56:13,300 --> 00:56:18,540
So when you start tackling a performance issue, do you know how to solve it in the beginning?

669
00:56:19,660 --> 00:56:36,540
[TOMAS] No. I mean, that's how I started like 25 years ago. You just have a slow query and you need to somehow figure out why is it slow or how to make it faster.

670
00:56:36,780 --> 00:57:00,540
Or like, is it missing an index? Maybe. Is the query written in an efficient way? Could be also the case. And I'm still in this situation like nowadays, right? Because there's so much code and so much functionality in Postgres that I just don't know all the pieces.

671
00:57:00,600 --> 00:57:07,960
So I think the approach is you need to be systematic.

672
00:57:08,340 --> 00:57:12,900
You need to look at the slow query and start by profiling.

673
00:57:13,440 --> 00:57:18,280
Maybe you can try a couple things, like rewrite the queries a couple times,

674
00:57:20,640 --> 00:57:27,360
or you can use explain, analyze to investigate which part of the query is slow,

675
00:57:28,610 --> 00:57:29,400
and that kind of stuff. [And when you say profiling, can you tell me more about what you mean?]

676
00:57:29,480 --> 00:57:49,140
Right, like well, profiling means either explain, analyze or using like one of the tools from the operating system to actually profile CPU time and that kind of stuff.

677
00:57:49,880 --> 00:57:54,280
So like perf, for example, in Linux, right?

678
00:57:53,780 --> 00:57:53,980
[CLAIRE] Okay.

679
00:57:55,110 --> 00:57:58,920
[TOMAS] I assume that like on Windows,

680
00:57:59,280 --> 00:58:02,300
there's like a different profiler and so on.

681
00:58:02,830 --> 00:58:05,300
But I definitely use like perf.

682
00:58:06,280 --> 00:58:08,700
You could probably also use things like DTrace

683
00:58:08,890 --> 00:58:11,740
and that kind of stuff.

684
00:58:12,260 --> 00:58:13,980
[CLAIRE] Shout out to Bryan Cantrill there.

685
00:58:14,380 --> 00:58:15,300
And Solaris. [Yeah]

686
00:58:15,820 --> 00:58:16,840
And OpenSolaris.

687
00:58:18,120 --> 00:58:18,340
Sorry. Past life.

688
00:58:18,480 --> 00:58:32,780
[TOMAS] Yeah, and I think for profiling the website I use most of the time to actually, you know, find the proper incantation of like whatever tool, that's Brendan Gregg's like a website.

689
00:58:33,240 --> 00:58:38,720
So that's a good source, right?

690
00:58:39,420 --> 00:58:44,920
[CLAIRE] And Brendan is another person who once worked at Sun Microsystems. [Yes]

691
00:58:47,359 --> 00:58:53,000
So, sorry for anyone who's listening who doesn't know that I used to work in the kernel group at

692
00:58:53,160 --> 00:58:59,180
Sun. I owe a lot of my formative years to those people and that team.

693
00:59:01,840 --> 00:59:08,720
Okay. So one of the other words you used earlier is talked about experimenting in trying to tackle

694
00:59:09,000 --> 00:59:13,620
performance problems. And I'm assuming you do a bunch of benchmarking too, as you progress further.

695
00:59:13,640 --> 00:59:15,000
along, am I right?

696
00:59:16,500 --> 00:59:21,260
[TOMAS] Yes well I think that's natural right like if you are working on performance then

697
00:59:22,539 --> 00:59:30,140
if your goal is to improve performance and behavior of the system then like you need to

698
00:59:30,560 --> 00:59:38,879
prove that your patch actually does that right so you just need to do a lot of benchmarking

699
00:59:39,220 --> 00:59:39,980
for that reason.

700
00:59:40,660 --> 00:59:46,000
But I also learn by doing benchmarks.

701
00:59:46,920 --> 00:59:51,400
I learn about how the system works

702
00:59:51,620 --> 00:59:54,320
by actually trying different things,

703
00:59:54,440 --> 00:59:56,620
like exposing the system to different inputs

704
00:59:57,060 --> 01:00:04,699
and kind of like learning how the heuristics

705
01:00:04,720 --> 01:00:07,280
in the algorithm, for example,

706
01:00:08,640 --> 01:00:09,900
behaves in practice.

707
01:00:12,820 --> 01:00:15,220
Because I definitely am not one of the people

708
01:00:15,280 --> 01:00:16,940
who can look at piece of code

709
01:00:17,020 --> 01:00:18,800
and kind of like immediately predict

710
01:00:19,740 --> 01:00:22,760
how it's going to behave in performance.

711
01:00:23,920 --> 01:00:25,660
Is it going to be fast?

712
01:00:25,840 --> 01:00:26,640
Is it going to be slow?

713
01:00:26,840 --> 01:00:28,980
Or like, I don't know.

714
01:00:30,180 --> 01:00:34,680
So I definitely need to do a bunch of testing

715
01:00:34,700 --> 01:00:43,460
and actually figure out like are there maybe some weird like cases where it doesn't behave correctly or something like that.

716
01:00:44,240 --> 01:01:01,920
[CLAIRE] Do you find that you also, as you progress along this, I'll call it a journey for each particular performance problem that you tackle, do you find that you also have to benchmark to make sure that you're not slowing down other parts of Postgres?

717
01:01:02,940 --> 01:01:05,200
You know, like speeding up and slowing something else?

718
01:01:03,020 --> 01:01:04,000
[TOMAS] Oh, yeah, absolutely.

719
01:01:06,400 --> 01:01:10,940
I think that's like, for example, I mentioned the index prefetching work.

720
01:01:12,220 --> 01:01:23,940
And making sure we are not causing regressions in cases, in queries that do not actually benefit from asynchronous I/O.

721
01:01:24,000 --> 01:01:30,340
Or like from prefetching is a huge part of the benchmarking.

722
01:01:30,720 --> 01:01:40,800
Making sure that we are not causing trouble to people who will not benefit from the patch

723
01:01:43,420 --> 01:01:53,020
is important I think and it's definitely important for other patches as well right.

724
01:01:53,460 --> 01:01:56,160
[CLAIRE] What are your go-to benchmarking tools?

725
01:01:56,510 --> 01:02:00,820
If somebody wants to delve into this more, where do they start?

726
01:02:00,930 --> 01:02:01,760
Where do they look at?

727
01:02:03,640 --> 01:02:03,860
[TOMAS] So I

728
01:02:07,020 --> 01:02:13,720
that's a really difficult question [I'm good at that] I mean because it depends on I mean.

729
01:02:15,300 --> 01:02:23,000
The tool that I use most of the time is probably like pgbench all right but

730
01:02:23,220 --> 01:02:30,880
I'm not really using that with the built-in workload.

731
01:02:33,240 --> 01:02:40,200
pgbench can also be used to orchestrate your custom benchmark,

732
01:02:40,900 --> 01:02:42,920
just run queries that you specify.

733
01:02:43,570 --> 01:02:47,940
And what exactly those queries will be heavily depends on

734
01:02:48,680 --> 01:02:51,780
what kind of patch are you working on.

735
01:02:52,900 --> 01:03:04,520
So, in a way, like constructing the benchmark is actually a way to learn about the patch, right?

736
01:03:05,400 --> 01:03:15,900
So, for example, if I'm going to work on a patch that supposedly, for example, optimizes aggregation, like group by queries,

737
01:03:16,680 --> 01:03:22,000
then the built-in workload in pgbench is completely useless.

738
01:03:22,680 --> 01:03:25,500
Because it doesn't have a single group by.

739
01:03:26,060 --> 01:03:30,580
So I will have to construct queries and data sets.

740
01:03:32,820 --> 01:03:36,140
And then I can actually use the pgbench to actually run that.

741
01:03:36,320 --> 01:03:40,920
And it will generate random queries and that kind of stuff.

742
01:03:41,880 --> 01:03:44,320
But I have to construct the workload actually test.

743
01:03:45,140 --> 01:04:00,900
Similarly, there are patches that are optimizing, you know, a join planning, like a join search, like in which order to join tables and so on.

744
01:04:01,470 --> 01:04:08,800
And again, the built-in pgbench workload is completely useless for this because it doesn't have any joins.

745
01:04:10,120 --> 01:04:14,060
So I will have to benchmark that patch.

746
01:04:14,590 --> 01:04:29,860
I will have to think about, okay, so I need to test queries with different numbers of joins, maybe different types of, you know, data sets, different data distributions, and that kind of stuff.

747
01:04:30,320 --> 01:04:39,060
So even just constructing the benchmark is a way to actually learn about the patch.

748
01:04:40,560 --> 01:04:46,120
So for example this is a great way to do a review of a patch that you know nothing about yet [Oh]

749
01:04:47,060 --> 01:04:54,780
so just going to the mental exercise of actually so how would I test this right like if I want to

750
01:04:54,940 --> 01:05:02,180
actually show that the patch helps I will have to do this right like the data set would look like this.

751
01:05:03,820 --> 01:05:10,260
But if I want to show that, what's the worst case, in which cases the patch

752
01:05:10,480 --> 01:05:20,720
will perform poorly, those are like mental exercises that you need to go through during

753
01:05:20,780 --> 01:05:21,360
the review.

754
01:05:21,820 --> 01:05:26,660
And maybe the patch misses one of those cases, right?

755
01:05:28,780 --> 01:05:30,600
And then you have a review.

756
01:05:31,660 --> 01:05:51,940
So, yes.

757
01:05:32,120 --> 01:05:37,380
[CLAIRE] So what you're saying is that you can build up your skills of assessing

758
01:05:37,400 --> 01:05:42,960
performance or assessing performance impact not just by working on these patches yourself but by

759
01:05:43,100 --> 01:05:49,860
doing patch review and coming up with creative ways to look at the ramifications of something

760
01:05:50,260 --> 01:05:53,400
that's being proposed. [Yes] cool I like it.

761
01:05:52,440 --> 01:05:55,880
[TOMAS] I think, and I think this is also

762
01:05:56,020 --> 01:05:58,220
like extremely important part of the review

763
01:05:58,400 --> 01:06:00,640
and extremely valuable part of the review.

764
01:06:01,320 --> 01:06:06,960
Because sometimes the review can be quite superficial.

765
01:06:07,420 --> 01:06:17,180
You will point out typos and spelling mistakes and naming issues in functions, and that's fine.

766
01:06:18,650 --> 01:06:28,240
But especially for more complex patches, it's important to actually talk about these trade-offs,

767
01:06:28,320 --> 01:06:36,120
actually like a design architecture and so on and also about these you know aspects of like is it

768
01:06:36,980 --> 01:06:41,080
actually properly implemented or like does it actually address all the,

769
01:06:43,420 --> 01:06:51,420
all the corner cases and so on so yeah and as I said I mean like when I have a patch that

770
01:06:53,820 --> 01:06:55,360
I haven't looked at before

771
01:06:56,390 --> 01:06:58,480
and I need to start doing like a review.

772
01:07:00,060 --> 01:07:03,380
One way is to look at the, you know, at the diff

773
01:07:04,120 --> 01:07:08,080
and like think about like what the patch should be doing

774
01:07:08,340 --> 01:07:08,860
and so on.

775
01:07:11,020 --> 01:07:12,940
What works for me better is actually

776
01:07:14,020 --> 01:07:15,440
doing some experiments with the patch

777
01:07:15,440 --> 01:07:17,920
and like trying to do, you know,

778
01:07:17,990 --> 01:07:21,480
trying to expose the patch to cases

779
01:07:21,920 --> 01:07:22,960
that are kind of like weird.

780
01:07:25,980 --> 01:07:26,580
[CLAIRE] Okay.

781
01:07:28,310 --> 01:07:34,539
So maybe I'm going to make an observation or two, actually, that may be obvious, but you

782
01:07:34,560 --> 01:07:41,240
tell me. What I'm hearing from all of this conversation is that the mentality that you

783
01:07:41,320 --> 01:07:45,580
bring to your work on performance and Postgres is that you start off by being very curious.

784
01:07:46,340 --> 01:07:52,660
When you've walked through your examples, you're asking lots of questions. And so I feel

785
01:07:52,860 --> 01:08:00,380
like if someone's wondering if they are a future performance hacker on a database, that curiosity

786
01:08:00,380 --> 01:08:03,340
might be a prerequisite that you kind of need.

787
01:08:04,160 --> 01:08:05,960
Am I... is that fair?

788
01:08:07,860 --> 01:08:13,960
[TOMAS] So it definitely is for me right like I need to be able to ask questions and

789
01:08:15,360 --> 01:08:16,140
I'm trying even

790
01:08:16,270 --> 01:08:18,380
even during like when reviewing a patch

791
01:08:18,980 --> 01:08:19,680
by other people

792
01:08:20,440 --> 01:08:22,220
I'm trying to more to ask questions

793
01:08:22,440 --> 01:08:23,720
than like point out issues

794
01:08:24,060 --> 01:08:24,180
right

795
01:08:26,159 --> 01:08:26,759
but

796
01:08:28,580 --> 01:08:29,819
I don't know

797
01:08:30,200 --> 01:08:32,140
if like other people

798
01:08:32,859 --> 01:08:34,000
work the same way

799
01:08:34,560 --> 01:08:35,160
probably not.

800
01:08:36,620 --> 01:08:36,700
[CLAIRE] Okay,

801
01:08:37,890 --> 01:08:38,540
all right.

802
01:08:39,060 --> 01:08:47,160
I only have, I'm generating something from a sample set of one, which we all know is not a good thing to do.

803
01:08:47,700 --> 01:08:54,240
So, so that's fair. But I still, I don't know. I don't think I'm generating from a sample set of one.

804
01:08:54,400 --> 01:08:59,819
I've worked with a lot of performance engineers in my life. And I used to manage a performance engineering team.

805
01:09:00,400 --> 01:09:08,500
And I do feel like curiosity is one of the hallmark traits. Doesn't mean you can't be uncurious and succeed in this space.

806
01:09:08,580 --> 01:09:16,400
You probably can but I don't know it seems important or it seems commonly shared.

807
01:09:15,380 --> 01:09:22,500
[TOMAS] Yeah, I think curiosity in general is probably important and required.

808
01:09:23,440 --> 01:09:27,559
But I know that other hackers in the Postgres community

809
01:09:28,520 --> 01:09:31,580
definitely do work in different ways.

810
01:09:32,060 --> 01:09:35,600
So, for example, I'm now collaborating with Peter Geoghegan.

811
01:09:35,859 --> 01:09:38,900
And his approach to problems is very different.

812
01:09:40,539 --> 01:09:42,560
And I'm not saying it's wrong.

813
01:09:43,040 --> 01:09:47,500
I actually do enjoy the collaboration because of that, right?

814
01:09:47,670 --> 01:09:52,140
Because we kind of like both complement each other in some way.

815
01:09:48,240 --> 01:09:48,339
[CLAIRE] Okay.

816
01:09:54,060 --> 01:09:56,160
So you're like a super team.

817
01:09:58,960 --> 01:10:04,400
[TOMAS] Yeah, I think I may be a sidekick, but yes.

818
01:09:59,120 --> 01:09:59,620
[CLAIRE] Superhero?

819
01:10:06,200 --> 01:10:13,800
Okay so the second observation I have is that I saw a diagram once that had it was like a tree

820
01:10:14,100 --> 01:10:18,500
with branches or you could think of it as a road system with a lot of dead ends on it

821
01:10:19,180 --> 01:10:29,640
and it was about the process of iteration and in my in my career I have collaborated extremely

822
01:10:29,640 --> 01:10:36,560
well with people who are comfortable iterating to make something better. And I've collaborated

823
01:10:36,860 --> 01:10:42,040
poorly, like not well, with people who are like, just want to get to the answer, just want to get

824
01:10:42,120 --> 01:10:46,980
there fast, good enough for government work. No, no, no, we don't need to revise this again.

825
01:10:47,140 --> 01:10:53,620
Revising it again is a waste of my time. Don't waste my time. And anyway, I feel like iteration

826
01:10:53,850 --> 01:10:57,800
is important. And maybe that's because that's my process. And what you described.

827
01:10:58,260 --> 01:11:03,020
It sounded like you were willing to take multiple steps,

828
01:11:03,300 --> 01:11:05,080
come at a problem from multiple angles,

829
01:11:05,430 --> 01:11:07,160
like go down dead ends and come back.

830
01:11:07,450 --> 01:11:10,060
But you never gave up, like in solving these problems,

831
01:11:10,380 --> 01:11:12,760
even if you didn't know as you were going,

832
01:11:13,330 --> 01:11:14,460
if you're going down the right path.

833
01:11:15,620 --> 01:11:17,980
Am I off base?

834
01:11:17,820 --> 01:11:25,180
[TOMAS] Yeah, I would say that like dead ends are part of the game so you just need to once in a while

835
01:11:25,310 --> 01:11:32,120
like explore something that actually is not the proper solution like or like

836
01:11:33,660 --> 01:11:35,420
figuring out what is the right

837
01:11:35,600 --> 01:11:37,260
trade-off in some of the

838
01:11:37,480 --> 01:11:38,240
algorithms or

839
01:11:39,280 --> 01:11:39,960
solutions

840
01:11:42,119 --> 01:11:43,500
I think that's important

841
01:11:43,940 --> 01:11:44,080
yes.

842
01:11:48,520 --> 01:11:52,340
[CLAIRE] Very cool. I like hearing that because it validates my world philosophy.

843
01:11:54,000 --> 01:11:54,680
But it's okay.

844
01:11:54,900 --> 01:11:58,280
Not everybody has to enjoy going down dead ends and iterating.

845
01:11:58,500 --> 01:11:58,840
That's fine.

846
01:11:59,610 --> 01:12:01,319
[TOMAS] Yes I think it's like

847
01:12:01,340 --> 01:12:07,900
a brainstorming in a way like figuring out like what is the right solution.

848
01:12:12,760 --> 01:12:14,240
[CLAIRE] Okay, so before we wrap,

849
01:12:14,720 --> 01:12:28,880
I'm trying to figure out if I have covered all of the examples or things that we should be talking about with regards to why it's so fun to hack on Postgres performance.

850
01:12:28,930 --> 01:12:33,760
I'm trying to make sure that we have met the promise of the title of this episode.

851
01:12:34,930 --> 01:12:40,040
So is there anything else that's really fun about it that we haven't covered?

852
01:12:43,940 --> 01:12:45,000
[TOMAS] Not that I can think of.

853
01:12:45,100 --> 01:12:51,820
I'm sure there is a lot of interesting, funny stories about different patches,

854
01:12:51,980 --> 01:12:59,200
but I don't remember anything that I would mention right now I guess.

855
01:12:59,680 --> 01:12:59,760
[CLAIRE] Okay.

856
01:13:00,600 --> 01:13:03,160
If you think of anything right after we hang up today,

857
01:13:03,860 --> 01:13:12,240
if you have a link to a particular patch that is like the canonical reference or like,

858
01:13:12,880 --> 01:13:17,440
this was a good example of a good first performance patch or something like that.

859
01:13:17,550 --> 01:13:17,980
I don't know.

860
01:13:17,980 --> 01:13:22,660
If anything comes to you, I can include it in the show notes as long as I get it by tomorrow.

861
01:13:24,800 --> 01:13:30,460
So I don't want to overwhelm people with 35 links, but one or two of them might be interesting.

862
01:13:32,180 --> 01:13:44,080
Which actually leads me to mention, do you tag certain patch ideas as patches that need to be done, like problems that need to be tackled?

863
01:13:44,120 --> 01:13:47,600
Do I remember correctly that you have a list somewhere?

864
01:13:48,340 --> 01:13:48,720
Scrolled away?

865
01:13:49,980 --> 01:13:53,280
[TOMAS] In what sense? [Maybe it's a good first patch. Is it good first patch?]

866
01:13:53,760 --> 01:13:58,920
I mean, like I've been, so I've been posting on my blog.

867
01:13:59,120 --> 01:14:07,360
I've been like trying to propose a couple ideas for like that might be a good idea

868
01:14:07,600 --> 01:14:13,800
for a good topic for like first patch for new contributors.

869
01:14:14,860 --> 01:14:26,100
I think those are on my blog I'm not sure if there are some still some of those ideas

870
01:14:26,680 --> 01:14:33,160
you know because people are already working on a couple of those I think so

871
01:14:35,320 --> 01:14:42,639
I'm not sure if there is something still available I would need to check.

872
01:14:41,460 --> 01:14:41,560
[CLAIRE] Okay.

873
01:14:42,190 --> 01:14:45,480
What is the name of that tag?

874
01:14:46,080 --> 01:14:46,580
Do you remember?

875
01:14:47,110 --> 01:14:47,740
Off the top of your head?

876
01:14:47,780 --> 01:14:50,900
[TOMAS] I think it was a patch idea probably.

877
01:14:51,820 --> 01:14:52,020
[CLAIRE] Okay.

878
01:14:52,860 --> 01:14:53,180
All right.

879
01:14:53,440 --> 01:14:59,760
Well, if there are any interesting ones that are available that have that tag, we can include it in the show notes.

880
01:15:00,000 --> 01:15:02,420
We'll do a search after.

881
01:15:01,360 --> 01:15:08,060
[TOMAS] Yeah I think this is also something people can discuss with me on during the office hours

882
01:15:08,140 --> 01:15:11,360
because one thing I found,

883
01:15:12,140 --> 01:15:15,120
and I'm judging by my personal experience,

884
01:15:15,260 --> 01:15:22,760
is that I'm much better when I hack on something

885
01:15:23,040 --> 01:15:25,500
that actually I personally am interested in.

886
01:15:26,880 --> 01:15:29,880
So when someone just gives you a topic for a patch and

887
01:15:30,080 --> 01:15:31,740
tells you: do this.

888
01:15:33,080 --> 01:15:38,120
I think people are, it's very easy to lose interest

889
01:15:38,120 --> 01:15:47,440
give up after a while so I think it's better to actually look for a topic that actually is

890
01:15:47,700 --> 01:15:54,500
interesting for you personally or that is interesting because maybe it's from a field

891
01:15:54,500 --> 01:16:03,380
of mathematics that you've been like working on before or maybe it's interesting for your employer

892
01:16:03,400 --> 01:16:12,700
or something right like so there's some sort of like interest so what I do suggest people is

893
01:16:13,140 --> 01:16:20,320
to maybe go through the commit fest app and like look at the topics and like figure out like what

894
01:16:20,500 --> 01:16:25,860
what patches could be like interesting but it's also something that I'm open to discuss

895
01:16:27,000 --> 01:16:28,800
during my office hours, right?

896
01:16:29,040 --> 01:16:30,940
So if you send me an email

897
01:16:33,200 --> 01:16:35,680
with a note that you don't know

898
01:16:36,190 --> 01:16:37,060
what to work on,

899
01:16:39,560 --> 01:16:41,100
I'm okay having like a chat.

900
01:16:42,040 --> 01:16:43,860
Then I can suggest you a couple of things.

901
01:16:45,720 --> 01:16:47,700
It's not going to be maybe a new patch.

902
01:16:48,020 --> 01:16:50,040
It's going to be like a patch

903
01:16:50,200 --> 01:16:51,920
someone else is already working on.

904
01:16:52,700 --> 01:16:56,880
But reviewing patches written by other people

905
01:16:56,900 --> 01:17:00,220
is extremely important or extremely useful way

906
01:17:00,220 --> 01:17:01,700
to actually learn about code.

907
01:17:01,980 --> 01:17:05,300
So that's what I would recommend, actually.

908
01:17:05,400 --> 01:17:13,520
[CLAIRE] So, and that's, I think that's interesting because at least for me initially, and this was a very

909
01:17:13,880 --> 01:17:20,660
superficial conclusion, I was wrong. But I used to assume that patch review was all about like QA.

910
01:17:20,680 --> 01:17:26,660
It was all about ensuring the quality of the committed patch, right? Doing the review to make

911
01:17:26,780 --> 01:17:31,300
sure. But it turns out that's only half the motivation right there. And the other half the

912
01:17:31,320 --> 01:17:36,320
motivation is to skill up everybody who's involved on the Postgres project.

913
01:17:36,670 --> 01:17:42,780
It's a great way for new people to start to learn and understand the system and the code.

914
01:17:43,700 --> 01:17:44,200
Agree, disagree?

915
01:17:44,060 --> 01:17:44,120
[TOMAS] Yeah

916
01:17:45,700 --> 01:17:48,060
I think it definitely is like

917
01:17:48,860 --> 01:17:51,000
a very important way to

918
01:17:52,520 --> 01:17:53,920
learn about the code

919
01:17:54,300 --> 01:17:54,940
but also

920
01:17:55,960 --> 01:17:57,620
it is a QA

921
01:17:58,160 --> 01:17:58,640
but also

922
01:18:01,900 --> 01:18:04,160
I think like having a patch

923
01:18:04,400 --> 01:18:06,280
doesn't mean that this is the right solution

924
01:18:07,920 --> 01:18:08,040
right

925
01:18:11,980 --> 01:18:13,640
there is always maybe

926
01:18:14,520 --> 01:18:16,560
a better way to actually solve the problem

927
01:18:16,860 --> 01:18:17,580
or maybe

928
01:18:18,440 --> 01:18:20,280
sometimes it happens

929
01:18:20,520 --> 01:18:22,480
that the patch actually is not

930
01:18:22,620 --> 01:18:30,320
like a problem worth solving so that can also happen of course right.

931
01:18:33,139 --> 01:18:39,840
[CLAIRE] When I think about patch review I'm sometimes reminded of I think it was a year ago exactly

932
01:18:40,140 --> 01:18:45,600
Robert Haas was on this podcast and he was talking about the mentoring program for Postgres.

933
01:18:45,620 --> 01:18:51,660
Robert, of course, is a Postgres committer or a major contributor, works at EDB, has been

934
01:18:51,840 --> 01:18:53,220
involved in the project for a long time.

935
01:18:53,760 --> 01:18:58,280
I'm not sure whether it's longer or not as long as you, but you both have been involved

936
01:18:58,280 --> 01:18:59,240
for decades, it seems.

937
01:18:58,740 --> 01:19:01,900
[TOMAS] Oh, Robert is definitely longer.

938
01:19:02,840 --> 01:19:03,160
[CLAIRE] Than you? [Yeah]

939
01:19:04,220 --> 01:19:04,380
Okay.

940
01:19:05,960 --> 01:19:09,800
But one of the things he said is one of the first ways that he got involved, he, of course,

941
01:19:10,060 --> 01:19:14,920
went to the mailing list and was reading pgsql-hackers and trying to spin up that way.

942
01:19:15,500 --> 01:19:19,920
And then as he started to participate in patch review, because he figured out this is a great

943
01:19:19,920 --> 01:19:20,920
way to learn, right?

944
01:19:21,070 --> 01:19:22,260
And he wanted to learn.

945
01:19:24,180 --> 01:19:31,800
He asked himself as he approached a patch review, can I say what Tom Lane would say before Tom

946
01:19:32,020 --> 01:19:32,660
Lane says it?

947
01:19:33,350 --> 01:19:39,180
Because Tom Lane, of course, is brilliant and a very long-term Postgres contributor and

948
01:19:40,080 --> 01:19:43,340
clearly had a lot of smart things to say, still does to this day.

949
01:19:44,100 --> 01:19:45,620
And I just thought that was funny.

950
01:19:46,120 --> 01:19:50,300
Can I say what Tom Lane is going to say before Tom says it?

951
01:19:50,760 --> 01:19:52,060
And that's how he challenged himself.

952
01:19:53,320 --> 01:19:55,720
And I do think it's interesting.

953
01:19:55,820 --> 01:20:03,620
What that reminds me of is that when artists who are learning how to paint go to art school,

954
01:20:04,160 --> 01:20:14,020
one of the things that they are asked to do is to like paint a copy of very famous paintings from

955
01:20:14,250 --> 01:20:23,320
previous generations masters and they're asked to paint it because replicating what is what has

956
01:20:23,320 --> 01:20:28,900
been done before and what is considered great can be a way to learn you then have to forge your own

957
01:20:28,940 --> 01:20:35,060
path as an artist, create your own unique designs and approaches and everything. But, and the same

958
01:20:35,060 --> 01:20:40,780
is true with writers. Sometimes like you can be given an assignment, which is like, okay, take the

959
01:20:40,800 --> 01:20:45,840
first paragraph from this very, very famous book and then evolve it in a different direction, but

960
01:20:46,080 --> 01:20:50,880
start by typing that in and starting there. And I don't know. I think it's interesting

961
01:20:52,640 --> 01:20:56,120
channeling people that have come before as a way of learning.

962
01:20:58,600 --> 01:21:08,200
All right. That's a little random rabbit hole from me. I will include a link. I found that the

963
01:21:08,620 --> 01:21:14,400
tag on your blog was patch idea. So I'll include a link in the show notes. There's only four of them

964
01:21:14,620 --> 01:21:20,200
on that list right now. Maybe if you find out they're all being done or have recently been done,

965
01:21:20,400 --> 01:21:25,260
then I can exclude it. Just let me know, cool?

966
01:21:25,699 --> 01:21:30,480
[TOMAS] Yeah, I definitely will post a couple more ideas in the future.

967
01:21:30,820 --> 01:21:35,200
So I think the tag is definitely valid.

968
01:21:32,380 --> 01:21:39,520
[CLAIRE] Okay. I'll include the link then. And the other thing I wanted to chime in with for anybody

969
01:21:39,820 --> 01:21:45,100
listening who is thinking about going to PGConf.dev in Vancouver in May, one of the organizers,

970
01:21:45,500 --> 01:21:49,420
Melanie Plageman dropped into the chat during today's recording.

971
01:21:50,010 --> 01:21:54,300
And she clarified that the meet and eat dinners, which are open to anybody who's attending,

972
01:21:54,920 --> 01:21:57,600
will be on both Tuesday and Thursday this year.

973
01:21:57,930 --> 01:22:01,520
So there'll be two nights where you have an opportunity to connect.

974
01:22:01,680 --> 01:22:07,320
I'm assuming Wednesday there's a evening reception that is part of the event as well,

975
01:22:08,300 --> 01:22:10,120
because there always is on that day.

976
01:22:10,720 --> 01:22:17,660
And the other thing I'll say is I don't have any knowledge of this, but my suspicion is that next

977
01:22:17,880 --> 01:22:23,560
year, PGConf.dev won't be in Vancouver. It might be back in Montreal, which is where it was last year.

978
01:22:24,220 --> 01:22:29,600
So if you do live nearby, maybe you're on the West Coast and you're thinking, oh, I won't go this year.

979
01:22:29,760 --> 01:22:34,160
I'll wait till next year. Don't do that because it may not be on the West Coast next year.

980
01:22:35,620 --> 01:22:37,040
So, all right.

981
01:22:38,080 --> 01:22:39,160
Thank you, Tomas.

982
01:22:39,320 --> 01:22:40,140
This has been fun.

983
01:22:40,500 --> 01:22:42,160
I've really enjoyed our conversation.

984
01:22:44,500 --> 01:22:47,500
And I'm not the only one who appreciates what you do,

985
01:22:48,080 --> 01:22:52,240
both for the Postgres code and performance,

986
01:22:52,660 --> 01:22:53,900
as well as for the community.

987
01:22:55,920 --> 01:22:57,780
So thank you for coming on the show.

988
01:22:56,820 --> 01:23:00,220
[TOMAS] Thank you for inviting me and thanks for the chat.

989
01:23:00,620 --> 01:23:02,240
It was quite pleasurable, I think.

990
01:23:03,400 --> 01:23:03,620
[CLAIRE] Awesome.

991
01:23:04,380 --> 01:23:08,880
And for those of you who are listening, if you liked today's episode and you want to hear more

992
01:23:08,900 --> 01:23:14,120
of these Talking Postgres episodes, you should subscribe on Apple and Spotify and YouTube or

993
01:23:14,240 --> 01:23:19,980
wherever you get your podcasts. And please tell your friends because word of mouth is one of the

994
01:23:20,320 --> 01:23:28,480
best ways for a podcast to grow a listenership, if that's a word. You can always get to past episodes

995
01:23:28,480 --> 01:23:33,400
and get links to subscribe on the different platforms at talkingpostgres.com.

996
01:23:34,160 --> 01:23:38,180
And we include transcripts on the episode pages on talkingpostgres.com too.

997
01:23:38,760 --> 01:23:41,600
We work very hard to make sure everything is correct,

998
01:23:42,340 --> 01:23:44,580
Postgres is spelled properly, etc., etc.

999
01:23:45,240 --> 01:23:48,320
And a big thank you to everybody who joined today's live recording

1000
01:23:48,660 --> 01:23:50,680
and participated in the text chat on Discord.