1
0:0:0,06 --> 0:0:2,6
Michael: Hello and welcome to Postgres.FM, a weekly show about

2
0:0:2,6 --> 0:0:3,58
all things PostgreSQL.

3
0:0:3,58 --> 0:0:6,22
I am Michael, founder of pgMustard and I'm joined as usual by

4
0:0:6,22 --> 0:0:7,7
Nik, founder of PostgresAI.

5
0:0:7,7 --> 0:0:8,42
Hey Nik.

6
0:0:9,0199995 --> 0:0:9,74
Nikolay: Hi Michael.

7
0:0:11,04 --> 0:0:15,54
Michael: And we have not 1, but 2 heavy hitters in from the PostGIS

8
0:0:15,98 --> 0:0:16,48
world.

9
0:0:16,84 --> 0:0:19,06
Both also recognize Postgres core contributors.

10
0:0:19,3 --> 0:0:23,1
First we have Regina Obe, President of Paragon Corporation,

11
0:0:23,24 --> 0:0:26,439999
a consulting firm, member of the PostGIS core development team

12
0:0:26,439999 --> 0:0:30,06
and steering committee, and co-author of a whole host of books

13
0:0:30,06 --> 0:0:34,34
on GIS and SQL, including the book PostGIS in Action.

14
0:0:34,7 --> 0:0:35,88
Welcome, Regina.

15
0:0:36,66 --> 0:0:38,34
Regina: Thanks, thanks for having me.

16
0:0:38,8 --> 0:0:39,8
Michael: It's our pleasure.

17
0:0:39,8 --> 0:0:43,88
And also we have Paul Ramsey, who is Staff Engineer at Snowflake

18
0:0:44,1 --> 0:0:48,7
via the Crunchy Data acquisition, the co-founder of PostGIS, member

19
0:0:48,7 --> 0:0:51,38
of the core development team and the chair of its steering committee.

20
0:0:51,5 --> 0:0:53,1
A pleasure to have you too, Paul.

21
0:0:53,4 --> 0:0:54,48
Paul: Hi, Michael and Nik.

22
0:0:54,48 --> 0:0:55,46
Nice to be here.

23
0:0:55,68 --> 0:0:56,38
Michael: It's wonderful.

24
0:0:56,38 --> 0:0:56,88
Right.

25
0:0:56,92 --> 0:1:0,22
So I was hoping actually that maybe one of you could give us a

26
0:1:0,22 --> 0:1:3,18
kind of brief history of the project and maybe the other one could

27
0:1:3,18 --> 0:1:6,58
give us an update on where it's at today and what it looks like.

28
0:1:6,58 --> 0:1:9,44
Paul: Well I'll take the history side of this because yeah, because

29
0:1:9,44 --> 0:1:13,94
PostGIS comes out of my history as a geospatial consultant and

30
0:1:13,94 --> 0:1:15,52
it's a long time ago now.

31
0:1:15,58 --> 0:1:19,14
PostGIS first released 0.1 was at the end of May in 2001.

32
0:1:19,82 --> 0:1:23,04
So we're creeping up on 25 years of PostGIS.

33
0:1:23,4 --> 0:1:24,26
Pretty crazy.

34
0:1:25,24 --> 0:1:30,06
PostGIS was born because, out of a consulting company here in

35
0:1:30,06 --> 0:1:33,06
Victoria, British Columbia, and it was born because we ran out

36
0:1:33,06 --> 0:1:33,66
of work.

37
0:1:34,02 --> 0:1:36,48
The fiscal year for the government, which is our main source

38
0:1:36,48 --> 0:1:38,979996
of money, ends on the March 31st.

39
0:1:39,8 --> 0:1:42,68
And the first release of PostGIS is 2 months after that.

40
0:1:42,9 --> 0:1:46,1
We had been doing an analytical work for the government using

41
0:1:46,12 --> 0:1:50,38
standard GIS tools and a database to store interim results in

42
0:1:50,38 --> 0:1:52,06
the database we're using was Postgres.

43
0:1:52,76 --> 0:1:56,04
And having had that experience of using a database and using

44
0:1:56,04 --> 0:1:59,56
SQL, it's becoming familiar with this sort of powerful tool to

45
0:1:59,6 --> 0:2:3,02
slice through big sets of chunks of data in different ways.

46
0:2:3,26 --> 0:2:6,82
And also being spatial people looked at this database tool and

47
0:2:6,82 --> 0:2:9,24
said, there's gotta be a way, like, there's gotta be a way to

48
0:2:9,24 --> 0:2:12,44
bring together the spatial questions we're asking and the non-spatial

49
0:2:12,5 --> 0:2:15,8
questions we're asking into one place so we can ask like more complicated

50
0:2:15,8 --> 0:2:18,42
questions and have it all fulfilled inside the database.

51
0:2:19,34 --> 0:2:23,6
And there was at that 0.1 relatively
fresh international standard,

52
0:2:23,86 --> 0:2:26,88
the simple features for SQL standard
from the Open Geospatial

53
0:2:26,88 --> 0:2:27,38
Consortium.

54
0:2:28,12 --> 0:2:34,12
There was one extant proprietary
product that did that.

55
0:2:34,6 --> 0:2:37,86
Oracle 8i at that point had been
released and it had a pretty,

56
0:2:38,56 --> 0:2:41,82
80, 20 complete implementation
of Spatial SQL.

57
0:2:42,5 --> 0:2:43,66
And, and we did both.

58
0:2:43,66 --> 0:2:46,16
We both looked at Oracle Spatial
and thought, can we use this?

59
0:2:46,16 --> 0:2:50,74
And at the time, if you wanted
to use it, you had to have Oracle

60
0:2:50,74 --> 0:2:51,24
Enterprise.

61
0:2:51,96 --> 0:2:54,56
It really, the implication was
about a quarter million dollar

62
0:2:54,72 --> 0:2:57,08
licensing bill to really use it
in anger.

63
0:2:57,18 --> 0:3:0,06
And that obviously it scared us
off as a small consulting company.

64
0:3:0,06 --> 0:3:3,08
It also successfully scared off
anyone in the government that

65
0:3:3,08 --> 0:3:5,5
we like, we may all get some of
the government who wants to do

66
0:3:5,5 --> 0:3:8,1
this stuff and we'll piggyback
on their licensing and learn it

67
0:3:8,1 --> 0:3:9,44
and do an implementation for them.

68
0:3:9,44 --> 0:3:11,26
And we'll be like spatial database
experts.

69
0:3:11,4 --> 0:3:13,48
Now it's too expensive for anyone
in the government.

70
0:3:13,5 --> 0:3:16,44
But we had this specification And
we had this experience with

71
0:3:16,44 --> 0:3:22,72
Postgres and Postgres itself was
built for type extension.

72
0:3:22,72 --> 0:3:27,74
So we also had the really like
clear on-ramp to add a new type,

73
0:3:27,88 --> 0:3:30,72
to bind it to access methods, to
add functions that worked on

74
0:3:30,72 --> 0:3:34,7
this type in a way that really
didn't exist in what was at the

75
0:3:34,7 --> 0:3:38,66
time the sort of premier open source
database MySQL.

76
0:3:39,52 --> 0:3:44,44
It took quite a few years beyond
2001 before MySQL added Spatial.

77
0:3:44,44 --> 0:3:46,52
And when they did, We went and
looked at their implementation,

78
0:3:46,56 --> 0:3:48,08
looked at the patch, and it was
big.

79
0:3:48,08 --> 0:3:50,9
It was a big patch that crossed
the whole code base.

80
0:3:50,9 --> 0:3:53,56
It really took someone who understood
MySQL, soup to nuts to

81
0:3:53,56 --> 0:3:54,06
do it.

82
0:3:54,06 --> 0:3:55,84
They had to do the index implementation,
like everything.

83
0:3:55,84 --> 0:3:57,88
They had to do a hell of a lot
of work to do it.

84
0:3:57,88 --> 0:4:2,54
Whereas we got this nice, tight
little code base sitting as an

85
0:4:2,54 --> 0:4:4,98
extension completely divorced from
the core code.

86
0:4:5,14 --> 0:4:7,7
The core code provided access method
prebuilt.

87
0:4:8,2 --> 0:4:9,4
We were good to go.

88
0:4:9,72 --> 0:4:14,74
So it was doable to have a working
extension that we could demonstrate

89
0:4:14,76 --> 0:4:17,7
to ourselves and to customers within
2 months and then release.

90
0:4:17,96 --> 0:4:20,86
So yeah, we were clever people,
but Postgres is a hell of a clever

91
0:4:20,86 --> 0:4:23,56
product and the PostGIS wouldn't
exist if Postgres hadn't been

92
0:4:23,56 --> 0:4:24,94
so clever to start with.

93
0:4:25,08 --> 0:4:27,18
Nikolay: And it was just right
at that time.

94
0:4:27,18 --> 0:4:29,24
Paul: At that point it was just,
it still is just actually the

95
0:4:29,24 --> 0:4:31,02
access method we use for spatial
indexing.

96
0:4:31,08 --> 0:4:31,28
Yeah.

97
0:4:31,28 --> 0:4:33,8
Nikolay: I'm smiling the whole
time you're talking because it

98
0:4:33,8 --> 0:4:38,28
connects 20 years ago, 2005, 2006,
2007.

99
0:4:38,5 --> 0:4:43,12
I remember first of all, this is
when I left Oracle 8i.

100
0:4:43,7 --> 0:4:44,2
Exactly.

101
0:4:44,34 --> 0:4:47,12
Like last time I touched Oracle
was exactly 8i.

102
0:4:47,46 --> 0:4:50,74
And then when I was building my
second startup social network,

103
0:4:50,74 --> 0:4:56,82
we put a lot of people and objects
like restaurants and so on,

104
0:4:56,82 --> 0:4:59,34
on map 2007, it was too early.

105
0:5:0,04 --> 0:5:5,1
And I remember I hired 2 guys,
Oleg Bartunov and Teodor Sigaev,

106
0:5:5,86 --> 0:5:6,88
to help us.

107
0:5:8,2 --> 0:5:14,94
It was so bad because it was amazing
to have GiST and reading

108
0:5:15,04 --> 0:5:19,12
all the articles and basics and
like 7 functions, abstraction,

109
0:5:19,2 --> 0:5:19,9
all great.

110
0:5:19,94 --> 0:5:24,24
But when we went to practical questions
and we had 10 million

111
0:5:24,24 --> 0:5:27,7
people to be placed on the map
and a lot of objects and then

112
0:5:27,7 --> 0:5:35,14
you try to solve simple problem,
show me nearest 10 objects on

113
0:5:35,14 --> 0:5:37,12
the map which have good reviews.

114
0:5:37,78 --> 0:5:42,26
And it's terrible because there
was no support for order by and

115
0:5:42,26 --> 0:5:42,76
KNN.

116
0:5:43,08 --> 0:5:44,18
I didn't know KNN.

117
0:5:45,06 --> 0:5:48,84
And since we moved so fast, I eventually
decided to pivot my

118
0:5:48,84 --> 0:5:51,46
startup away from maps and so on.

119
0:5:51,46 --> 0:5:56,72
But Oleg and Teodor, a couple
of years later, brought KNN to

120
0:5:56,72 --> 0:5:57,22
GiST.

121
0:5:57,72 --> 0:5:58,7
It was 9.1.

122
0:5:59,68 --> 0:6:3,34
And then Alexander Korotkov, also
Nikita Glukhov helped to brought

123
0:6:3,34 --> 0:6:4,9
the same like to SP-GiST.

124
0:6:5,28 --> 0:6:7,1
And this connects us a lot.

125
0:6:7,28 --> 0:6:11,02
But I just, I showed them this
problem and that's it for me.

126
0:6:11,28 --> 0:6:12,18
It was great.

127
0:6:12,36 --> 0:6:16,1
Paul: Yeah, back in the early days,
starting off incubating an

128
0:6:16,1 --> 0:6:18,64
open source project, within a couple
of years, we had quite a

129
0:6:18,64 --> 0:6:20,86
few users and some institutional
users.

130
0:6:21,6 --> 0:6:27,74
And at the time the GiST infrastructure
was, did not include

131
0:6:27,9 --> 0:6:28,4
recovery.

132
0:6:29,06 --> 0:6:30,98
It wasn't a completely ACID implementation.

133
0:6:31,46 --> 0:6:36,5
And, at that point I had like a
stable of people who had businesses

134
0:6:36,88 --> 0:6:41,68
depending on the PostGIS, and
I put out a call for development

135
0:6:41,68 --> 0:6:45,98
money and got scraped together
from all the different companies

136
0:6:46,0 --> 0:6:51,14
who were using or depended on Postgres
about 10 or $15,000 and

137
0:6:51,14 --> 0:6:56,14
then ship that money to Oleg and
Teodor who did the work necessary

138
0:6:56,14 --> 0:6:57,22
to make GiST recoverable.

139
0:6:57,84 --> 0:7:1,0
And it was a weird thing to sell
to the community.

140
0:7:1,0 --> 0:7:2,62
It's we're going to get this work
done.

141
0:7:2,88 --> 0:7:6,76
You're using it now as a result
of the release cycle of Postgres

142
0:7:6,76 --> 0:7:8,48
and time takes to do the work and
all that stuff.

143
0:7:8,48 --> 0:7:11,82
You'll be able to use it and show
it to your clients in 12 to

144
0:7:11,82 --> 0:7:12,68
18 months.

145
0:7:13,04 --> 0:7:15,82
But you're going to give me the
money right up front.

146
0:7:16,08 --> 0:7:18,78
So it was interesting introduction,
both to the, to the Postgres

147
0:7:19,02 --> 0:7:23,84
development cycle, and also to
like trying to drum up crowdsourced

148
0:7:24,18 --> 0:7:26,46
funds for open source development.

149
0:7:26,46 --> 0:7:29,64
And it really taught me that crowdsourcing
funds is really hard

150
0:7:30,04 --> 0:7:32,64
because it's people are willing
to pay to get the bug, which

151
0:7:32,64 --> 0:7:36,24
is bothering them right now, fixed
right away.

152
0:7:36,82 --> 0:7:39,06
Much more difficult to get them
to pay for something, which will

153
0:7:39,06 --> 0:7:42,0
maybe make them a little bit happier
in the far future.

154
0:7:42,28 --> 0:7:44,68
It's a harder, it's a harder sale.

157
0:7:44,68 --> 0:7:45,78
Nikolay: Especially for startups.

158
0:7:45,9 --> 0:7:47,86
So they move too fast to

159
0:7:47,86 --> 0:7:48,6
Paul: understand that

160
0:7:48,6 --> 0:7:51,04
Nikolay: the thing will come only
in a couple of years.

161
0:7:51,38 --> 0:7:53,38
Regina: Yeah, they might not be
around that long.

162
0:7:54,92 --> 0:7:59,74
Nikolay: But I'm glad K&N was brought
into Postgres and PostGIS

163
0:7:59,88 --> 0:8:0,56
as well.

164
0:8:0,72 --> 0:8:1,6
Yeah, great.

165
0:8:1,6 --> 0:8:1,84
Mason.

166
0:8:1,84 --> 0:8:4,64
Michael: Is that a good excuse
for us to get Regina to give us

167
0:8:4,64 --> 0:8:8,14
a little bit of like a catching
up from, it's been 25 years,

168
0:8:8,14 --> 0:8:8,56
right?

169
0:8:8,56 --> 0:8:11,26
You've done a lot of work since
those early days.

170
0:8:11,58 --> 0:8:12,34
What's changed?

171
0:8:12,34 --> 0:8:15,58
What does it look like now and
how is that different to the first

172
0:8:15,58 --> 0:8:16,08
versions?

173
0:8:18,64 --> 0:8:23,68
Regina: I mean, we definitely have
a bigger audience and we now

174
0:8:23,68 --> 0:8:27,44
have people seem to be running
things in the cloud more than

175
0:8:27,44 --> 0:8:30,64
they're running things on premises
or maybe not.

176
0:8:31,02 --> 0:8:36,1
I think it's yeah slowly getting
to the point where most people

177
0:8:36,1 --> 0:8:41,42
we see are running PostGIS in
Google Cloud or Amazon Cloud

178
0:8:41,92 --> 0:8:43,36
or Microsoft Azure.

179
0:8:45,06 --> 0:8:50,2
I think that's the biggest thing
that's changed since we started.

180
0:8:51,0 --> 0:8:54,24
Michael: I was thinking as well,
it looked like it started as

181
0:8:54,24 --> 0:8:57,62
a single extension and it's now
kind of a collection, it's like

182
0:8:57,62 --> 0:9:1,96
a family of extensions that serve
quite a few more use cases

183
0:9:2,02 --> 0:9:2,52
and lots

184
0:9:2,52 --> 0:9:3,2
Regina: of functions.

185
0:9:4,12 --> 0:9:7,44
So when we started off, we just
had the vector support and then

186
0:9:7,44 --> 0:9:9,06
we added raster support.

187
0:9:9,84 --> 0:9:12,98
And originally we thought, Oh,
we'll just throw raster in the

188
0:9:12,98 --> 0:9:13,98
PostGIS extension.

189
0:9:14,14 --> 0:9:18,58
And then it got very fat and people
started complaining.

190
0:9:18,58 --> 0:9:21,36
Why do I have to carry these thousands
of functions that I don't

191
0:9:21,36 --> 0:9:22,58
even use raster?

192
0:9:23,8 --> 0:9:27,88
So then we broke it out into 2
extensions, PostGIS and PostGIS

193
0:9:27,94 --> 0:9:28,44
raster.

194
0:9:29,16 --> 0:9:35,04
And we also have PostGIS topology
And then PostGIS spun off other

195
0:9:35,04 --> 0:9:35,54
extensions.

196
0:9:35,86 --> 0:9:42,24
So you have the H3 extension, which
is now it's, we brought it

197
0:9:42,4 --> 0:9:46,12
into our organization, but it's
still a separate extension.

198
0:9:46,12 --> 0:9:49,44
It's not even part of the PostGIS
extension because we forked

199
0:9:49,44 --> 0:9:51,86
it from the original owner into
our system.

200
0:9:51,86 --> 0:9:53,4
So that was very recent.

201
0:9:53,96 --> 0:9:57,54
And then there's another extension called MobilityDB, which is

202
0:9:57,54 --> 0:10:0,44
a separate extension, separate group of people, but they build

203
0:10:0,44 --> 0:10:1,54
on top of PostGIS.

204
0:10:2,26 --> 0:10:5,94
And they deal with things like, you know, vehicle movement, the

205
0:10:5,94 --> 0:10:10,22
speed at which things move and all that and flying birds, trains.

206
0:10:11,64 --> 0:10:13,14
I think those are the key ones.

207
0:10:13,14 --> 0:10:13,64
Yeah.

208
0:10:13,9 --> 0:10:16,56
And then there's also a pgRouting, which I'm a member of too,

209
0:10:16,56 --> 0:10:20,14
which deals with network routing, like driving, biking, walking,

210
0:10:21,04 --> 0:10:22,58
getting directions on that.

211
0:10:23,08 --> 0:10:25,82
So yes, it's spun up a lot of other extensions.

212
0:10:26,48 --> 0:10:26,98
Paul: Yeah.

213
0:10:27,26 --> 0:10:28,76
So that's the technical side.

214
0:10:28,86 --> 0:10:30,8
Deployment is a big deal for sure.

215
0:10:30,98 --> 0:10:33,78
When we started, it was, you want to use PostGIS?

216
0:10:33,78 --> 0:10:35,04
Here's the source code.

217
0:10:35,08 --> 0:10:39,1
I'm tired in the contrib directory of an existing Postgres installation.

218
0:10:39,16 --> 0:10:42,18
And then you build and then you install and now you have PostGIS.

219
0:10:43,26 --> 0:10:46,08
And around the time Regina started joining us, she helped with

220
0:10:46,08 --> 0:10:46,72
Windows builds.

221
0:10:46,72 --> 0:10:49,2
So Windows people at least could have a pre-build binary.

222
0:10:49,2 --> 0:10:50,88
Regina: And I still do Windows builds.

223
0:10:51,04 --> 0:10:53,94
Paul: At some point, PDDG included us in their builds, and that

224
0:10:53,94 --> 0:10:54,92
was a huge change.

225
0:10:54,92 --> 0:10:56,84
So now it meant that you didn't have to build it yourself.

226
0:10:56,84 --> 0:10:59,54
You could just find the right packages and install those.

227
0:11:0,04 --> 0:11:2,92
That opened up a whole pile of new users who previously wouldn't

228
0:11:2,92 --> 0:11:3,84
have tried it.

229
0:11:3,84 --> 0:11:6,9
But yeah, as Regina said, the real tipping point lately has been

230
0:11:7,28 --> 0:11:9,76
being part of the default installs on all the clouds.

231
0:11:9,76 --> 0:11:12,5
So there's no longer a software install step from the point of

232
0:11:12,5 --> 0:11:13,2
view of the end user.

233
0:11:13,2 --> 0:11:16,04
It's just type create extension PostGIS.

234
0:11:16,74 --> 0:11:22,58
And that ubiquity really goes beyond the ubiquity for users.

235
0:11:22,8 --> 0:11:25,28
I feel like we've achieved a certain level of ubiquity in the

236
0:11:25,28 --> 0:11:25,76
industry.

237
0:11:25,76 --> 0:11:29,76
Like, first of all, like just geospatial as database functionality

238
0:11:29,86 --> 0:11:30,94
is table stakes now.

239
0:11:30,94 --> 0:11:35,5
It's pretty rare to find a SQL implemented database that doesn't

240
0:11:35,5 --> 0:11:36,24
have Geospatial.

241
0:11:36,94 --> 0:11:39,82
Those who don't are all tacking it on at a great rate.

242
0:11:40,08 --> 0:11:42,18
Regina: Ours is still the most popular, I think.

243
0:11:42,18 --> 0:11:46,68
I think it's surprising that Postgres, even though it started

244
0:11:46,68 --> 0:11:52,1
out as a meager database, PostGIS itself is probably more popular

245
0:11:52,16 --> 0:11:56,18
than Oracle Spatial and SQL Server Spatial and MySQL.

246
0:11:56,18 --> 0:12:0,04
I don't even hear other people doing serious work on those.

247
0:12:0,04 --> 0:12:2,04
Paul: Yeah, And that's probably not fair because I bet you there's

248
0:12:2,04 --> 0:12:5,38
a whole bunch of institutional users of those databases who they

249
0:12:5,38 --> 0:12:6,4
never leave the ranch.

250
0:12:6,4 --> 0:12:10,2
So that's all I understand is Oracle spatial or SQL server spatial

251
0:12:10,2 --> 0:12:12,54
because that's just the way their
institutions are shaped.

252
0:12:12,72 --> 0:12:15,72
But there's no doubt that in the
wider world, PostGIS is the

253
0:12:15,72 --> 0:12:16,0
thing.

254
0:12:16,0 --> 0:12:19,54
The thing that made me really understand
that PostGIS was the

255
0:12:19,54 --> 0:12:24,16
lingua franca was seeing the
Google BigQuery announcement

256
0:12:24,32 --> 0:12:29,12
blog post where they said one of
the features of Google BigQuery

257
0:12:29,12 --> 0:12:30,64
was that it was PostGIS compatible.

258
0:12:31,74 --> 0:12:32,9
It was like, Oh, wow.

259
0:12:32,9 --> 0:12:33,4
Okay.

260
0:12:33,9 --> 0:12:35,7
So that's nice to see.

261
0:12:35,96 --> 0:12:37,96
Michael: That's them saying you're
the standard, right?

262
0:12:37,96 --> 0:12:39,18
That's pretty cool.

263
0:12:39,22 --> 0:12:39,48
Paul: Yeah.

264
0:12:39,48 --> 0:12:42,18
And it's a standard that I could
have said, we support the simple

265
0:12:42,18 --> 0:12:45,24
features for SQL standard, but
they decided that PostGIS compatible

266
0:12:45,24 --> 0:12:47,58
was more understandable to the
larger marketplace.

267
0:12:47,64 --> 0:12:49,34
It's like, okay, so we have the
mindshare.

268
0:12:49,86 --> 0:12:50,74
Nikolay: Yeah, that's great.

269
0:12:50,74 --> 0:12:54,36
Actually, my, my daughter, she
is a fourth year bachelor degree

270
0:12:54,36 --> 0:12:59,26
student at UCSD and they have projects
related to wildfires in

271
0:12:59,26 --> 0:13:1,1
California and GIS as well.

272
0:13:1,1 --> 0:13:5,74
And, but I was super annoyed to
hear it's not PostGIS, but it's

273
0:13:5,74 --> 0:13:8,3
actually Google BigQuery, so that's
good to hear.

274
0:13:8,32 --> 0:13:8,82
Yeah.

275
0:13:9,14 --> 0:13:10,22
It's at least compatible.

276
0:13:10,44 --> 0:13:10,94
Great.

277
0:13:11,14 --> 0:13:11,96
Paul: Oh, reusable.

278
0:13:12,16 --> 0:13:13,06
Reusable knowledge.

279
0:13:13,62 --> 0:13:13,98
Nikolay: Cool.

280
0:13:13,98 --> 0:13:16,74
What's the latest things you can
share?

281
0:13:16,74 --> 0:13:18,1
What's the latest development?

282
0:13:18,42 --> 0:13:20,82
This is one thing I'm also curious
about.

283
0:13:20,82 --> 0:13:25,22
I know, for example, PostGIS works
quite well with

284
0:13:25,24 --> 0:13:25,58
TimescaleDB.

285
0:13:25,58 --> 0:13:27,28
It should work with pgvector.

286
0:13:27,28 --> 0:13:27,88
I never tried.

287
0:13:27,88 --> 0:13:28,64
I'm curious.

288
0:13:28,66 --> 0:13:30,64
And with sharding systems as well.

289
0:13:31,12 --> 0:13:32,8
There are super extensions.

290
0:13:33,42 --> 0:13:36,6
In the case of Postgres, it's not
super extension, it's a super

291
0:13:36,78 --> 0:13:38,44
set of extensions.

292
0:13:38,5 --> 0:13:40,46
Not super set, a collection of
extensions.

293
0:13:40,76 --> 0:13:45,52
But sometimes it's hard to make
it work together, these big building

294
0:13:45,52 --> 0:13:48,64
blocks what's there especially
PostgreSQL vector because it's also

295
0:13:48,64 --> 0:13:51,04
important building block is there's
right.

296
0:13:51,76 --> 0:13:53,26
What's the recent development.

297
0:13:54,32 --> 0:13:56,84
Regina: No if there's any recent
development I think they just

298
0:13:56,84 --> 0:14:0,4
work together and it was because
of the way Postgres is set up

299
0:14:0,4 --> 0:14:0,9
right.

300
0:14:1,22 --> 0:14:5,08
You can all You can have 3 or 4
different extensions in the same

301
0:14:5,08 --> 0:14:10,2
query, and it's just because they
standardize on all data sets,

302
0:14:10,2 --> 0:14:13,44
get the same feature set, get the
same core feature set.

303
0:14:13,44 --> 0:14:18,06
And so you don't have that annoying
problem that MySQL has with

304
0:14:18,06 --> 0:14:21,24
their different storage engines
and their different query syntax

305
0:14:21,42 --> 0:14:22,42
for things.

306
0:14:22,42 --> 0:14:24,78
So things just work seamlessly
well.

307
0:14:25,24 --> 0:14:28,6
So I don't think we did anything
special to try to make ourselves

308
0:14:28,7 --> 0:14:29,78
work with pgvector.

309
0:14:31,56 --> 0:14:35,2
Paul: Yeah, pgvector is more similar
to us than, say, one of, like

310
0:14:35,2 --> 0:14:38,94
you said, the super extensions
like Timescale or Citus, like

311
0:14:38,94 --> 0:14:42,54
these sort of overarching big swings
of the architecture of the

312
0:14:42,54 --> 0:14:43,04
system.

313
0:14:43,44 --> 0:14:46,36
pgvector, like, they provide a
type, they provide access methods,

314
0:14:46,38 --> 0:14:48,34
they provide some functions that
make sense for that type.

315
0:14:48,34 --> 0:14:50,52
They're very PostGIS-esque in that
respect.

316
0:14:50,98 --> 0:14:53,3
But from the point of view of what
they're for, they're kind

317
0:14:53,3 --> 0:14:53,86
of orthogonal.

318
0:14:53,86 --> 0:14:58,7
I've only seen one example recently
of PostGIS and pgvector being

319
0:14:58,7 --> 0:14:59,34
used together.

320
0:14:59,34 --> 0:15:1,2
And they're not being used together
because they're getting any

321
0:15:1,2 --> 0:15:3,24
great leverage out of PostGIS.

322
0:15:3,24 --> 0:15:5,7
And they're getting leverage out
of the fact that PostGIS itself

323
0:15:5,74 --> 0:15:8,3
has a huge amount of third-party
interconnectivity.

324
0:15:9,4 --> 0:15:12,28
So you use something like really
high dimensional search and

325
0:15:12,28 --> 0:15:14,02
pgvector to get a bunch of results.

326
0:15:14,04 --> 0:15:16,92
And if those results happen to
include spatial information, then

327
0:15:16,92 --> 0:15:20,34
you throw that spatial information
on a map because everything

328
0:15:20,34 --> 0:15:23,1
connects to PostGIS, all the mapping
software connects to PostGIS,

329
0:15:23,32 --> 0:15:24,11
and that's how it works.

330
0:15:24,11 --> 0:15:27,26
We really saw a cool example of
this in our last PostGIS Day.

331
0:15:27,26 --> 0:15:31,16
We have a PostGIS Day mid-November
every year, an online conference,

332
0:15:31,16 --> 0:15:35,34
and Shawn Burke from GeoBase
showed us geospatial AI using

333
0:15:35,34 --> 0:15:40,12
pgvector as the, as a search engine,
but then getting the results

334
0:15:40,12 --> 0:15:41,94
back through PostGIS to his mapping
tools.

335
0:15:41,94 --> 0:15:43,3
So it could show us the results.

336
0:15:43,44 --> 0:15:46,86
Neat example of what the geospatial
are calling people are calling

337
0:15:46,86 --> 0:15:50,14
foundational models, not large
language models, but a completely

338
0:15:50,14 --> 0:15:55,08
different look at how to use the
AI kind of technology for doing

339
0:15:55,08 --> 0:15:56,26
geospatial work.

340
0:15:57,44 --> 0:16:0,7
Nikolay: And if we talk back to
TimescaleDB, I saw great examples

341
0:16:0,7 --> 0:16:4,04
of Postgres and TimescaleDB working
together, because if you

342
0:16:4,04 --> 0:16:8,56
collect a lot of geographic signals
from drivers or from anything

343
0:16:8,56 --> 0:16:11,28
like on the map, you want to partition
all this, right?

344
0:16:11,28 --> 0:16:13,06
Because it's a lot of data, right?

345
0:16:13,58 --> 0:16:16,64
One way or another partitioning should
work, because it's also

346
0:16:16,64 --> 0:16:18,74
time-series, not only geographical
data, right?

347
0:16:18,74 --> 0:16:23,3
So this is natural pair of extensions
working together.

348
0:16:23,3 --> 0:16:23,76
Okay.

349
0:16:23,76 --> 0:16:27,56
What about lakes and since like
your company got acquired last

350
0:16:27,56 --> 0:16:28,06
year?

351
0:16:28,14 --> 0:16:28,64
Yeah.

352
0:16:28,7 --> 0:16:29,18
Yeah.

353
0:16:29,18 --> 0:16:32,0
What, what, recent development
there?

354
0:16:32,46 --> 0:16:35,68
Paul: And this ties actually back
to your other question, which

355
0:16:35,68 --> 0:16:38,04
is at the core is an interoperability
question.

356
0:16:38,84 --> 0:16:43,66
And the nature of the type model
in Postgres is such that when

357
0:16:43,66 --> 0:16:46,56
you add a new type using the Postgres
type model, a lot of the

358
0:16:46,56 --> 0:16:52,08
time things just work because the
overarching machinery of a

359
0:16:52,08 --> 0:16:55,52
big, a big piece of machinery,
say Citus doesn't think necessarily

360
0:16:55,52 --> 0:16:56,5
about particular types.

361
0:16:56,5 --> 0:16:59,92
It thinks just about type mappings
and the type mappings are

362
0:16:59,92 --> 0:17:3,5
entered in when you add the, add
the new type to the, like the

363
0:17:3,5 --> 0:17:6,68
pg_type table and, and your
pg_attributes just refer back

364
0:17:6,68 --> 0:17:7,36
to the type table.

365
0:17:7,36 --> 0:17:10,12
So everything is runtime configured
and that's the underlying

366
0:17:10,32 --> 0:17:12,0
expectation of these extensions.

367
0:17:12,56 --> 0:17:13,82
So stuff just works.

368
0:17:14,06 --> 0:17:16,72
It's only when you have to push
out to some external format that

369
0:17:16,72 --> 0:17:18,12
has its own understanding of the
world.

370
0:17:18,12 --> 0:17:20,4
You have to start thinking about
these mappings again.

371
0:17:20,86 --> 0:17:25,56
pg_lake, which was open sourced
a few months ago, is a set of

372
0:17:25,56 --> 0:17:30,04
tooling to allow you to push data
from Postgres database out

373
0:17:30,04 --> 0:17:35,1
to parquet or iceberg data on object
stores.

374
0:17:35,5 --> 0:17:37,1
Michael: I had a question around
that.

375
0:17:37,2 --> 0:17:41,84
Only in researching for this, I
came across GeoParquet for the

376
0:17:41,84 --> 0:17:42,44
first time.

377
0:17:42,44 --> 0:17:44,02
I didn't know that was a thing.

378
0:17:44,1 --> 0:17:44,6
Paul: Yeah.

379
0:17:45,04 --> 0:17:48,94
There's a long history of the geospatial
community taking extant

380
0:17:48,94 --> 0:17:52,24
open source standards, as they
say, which are like good and clearly

381
0:17:52,24 --> 0:17:53,2
going somewhere insane.

382
0:17:53,2 --> 0:17:54,8
Regina: GeoJSON is another 1.

383
0:17:54,8 --> 0:17:55,3
Paul: Exactly.

384
0:17:55,52 --> 0:17:59,32
Back when JSON was turning into
the lingua franca wire transfer

385
0:17:59,44 --> 0:18:1,92
protocol, The geospatial community
said, but we need a way to

386
0:18:1,92 --> 0:18:3,08
encode geometry too.

387
0:18:3,08 --> 0:18:7,5
So they invented one and eventually
got it ISO, not ISO, IETF standardized.

388
0:18:8,16 --> 0:18:9,18
Same thing with Parquet.

389
0:18:9,18 --> 0:18:13,0
So you have Parquet, which in its
initial versionings understands

390
0:18:13,08 --> 0:18:17,52
like the basic types of relational
databases and doesn't understand

391
0:18:17,54 --> 0:18:18,48
anything about geo.

392
0:18:18,48 --> 0:18:22,26
So like people, geo people want
to push data through this, through

393
0:18:22,26 --> 0:18:25,8
this format out to their own clients
and say, we've got this

394
0:18:25,8 --> 0:18:28,48
byte type and we've got this metadata
slot.

395
0:18:28,86 --> 0:18:33,18
So we'll slot some information
about columns that are holding

396
0:18:33,18 --> 0:18:36,04
geometry and we'll stick the raw
geometry in there into the byte

397
0:18:36,04 --> 0:18:36,54
type.

398
0:18:36,54 --> 0:18:40,64
And that will be just enough affordance
for geo-aware tools just

399
0:18:40,64 --> 0:18:43,08
to look in the metadata and say,
Oh wait, there's geo here.

400
0:18:43,08 --> 0:18:44,28
And we can pull it out.

401
0:18:44,54 --> 0:18:48,12
Fortunately that for the Parquet
and Iceberg formats has been

402
0:18:48,12 --> 0:18:49,66
a very short interregnum.

403
0:18:50,24 --> 0:18:52,44
Sometimes these things last for
like years, 5 years.

404
0:18:52,44 --> 0:18:53,74
And we're still playing with these
hacks.

405
0:18:53,74 --> 0:18:56,38
I think GeoParquet and GeoIceberg
lasted for about a year and

406
0:18:56,38 --> 0:18:59,1
a half before the mainline projects
came along and said, you

407
0:18:59,1 --> 0:19:1,76
know what we need, we need a geometry
type and a geography type.

408
0:19:1,76 --> 0:19:3,42
We should just formalize that.

409
0:19:3,44 --> 0:19:6,4
So yeah, so that was formalized,
I think 9 to 12 months ago,

410
0:19:6,4 --> 0:19:9,62
both Parquet and Iceberg got around
to saying, yeah, we have

411
0:19:9,62 --> 0:19:10,32
an actual type.

412
0:19:10,32 --> 0:19:13,52
And I meant that the metadata could
like live in proper slots.

413
0:19:13,52 --> 0:19:17,22
So where you'd expect it to be,
which in turn made engine developers

414
0:19:17,28 --> 0:19:20,6
be able to do things like much
faster scans and pruning of these

415
0:19:20,6 --> 0:19:20,98
files.

416
0:19:20,98 --> 0:19:24,28
So it's been good all around and
you can forget about GeoParquet

417
0:19:24,28 --> 0:19:27,08
and forget about GeoIceberg because
those there's a pieces of

418
0:19:27,08 --> 0:19:30,26
history now there, the geo stuff
is now actually embedded in

419
0:19:30,26 --> 0:19:31,3
the standard proper.

420
0:19:32,62 --> 0:19:32,9
That's

421
0:19:32,9 --> 0:19:33,66
Michael: great news.

422
0:19:33,9 --> 0:19:34,4
Paul: Yeah.

423
0:19:36,66 --> 0:19:37,16
Yes.

424
0:19:37,54 --> 0:19:38,0
So, yeah.

425
0:19:38,0 --> 0:19:42,52
So pg_lake is a way of pushing
data out of Postgres into the

426
0:19:42,52 --> 0:19:46,52
public internet in places where
these Parquet and Iceberg engines

427
0:19:46,52 --> 0:19:47,54
can scan them.

428
0:19:47,86 --> 0:19:53,1
And the promise of that is that
if you pump your data out in

429
0:19:53,1 --> 0:19:57,18
hopefully an ordering, which is
good for the bulk queries you

430
0:19:57,18 --> 0:20:2,3
expect to apply to the data, you
can then hit those object store

431
0:20:2,86 --> 0:20:6,42
data lakes, if you will, God help
us, with an engine which understands

432
0:20:6,42 --> 0:20:9,98
those formats and can slam through
them in massive parallel.

433
0:20:10,52 --> 0:20:14,18
And, you know, one of those engines
is DuckDB, a single node engine,

434
0:20:14,18 --> 0:20:15,4
which is very good at that.

435
0:20:15,4 --> 0:20:18,84
1 of those engines is Snowflake,
a multi-node, highly parallel

436
0:20:18,84 --> 0:20:20,58
engine, which is also really good
at that.

437
0:20:20,58 --> 0:20:21,88
Another one is Databricks.

438
0:20:22,54 --> 0:20:23,48
All the same gig.

439
0:20:23,48 --> 0:20:26,38
It's like this separation of compute
and storage.

440
0:20:26,94 --> 0:20:30,3
The compute comes to the object
store, rips this file off.

441
0:20:30,3 --> 0:20:31,86
It is ridiculously inefficient.

442
0:20:32,26 --> 0:20:34,76
If you're a Postgres developer,
you think, this is really inefficient,

443
0:20:34,76 --> 0:20:36,76
you should have an index, so you
can find exactly the thing you're

444
0:20:36,76 --> 0:20:37,44
looking for.

445
0:20:37,44 --> 0:20:39,72
And these compute engines are,
no, I will look at every

446
0:20:39,72 --> 0:20:40,22
record.

447
0:20:40,24 --> 0:20:42,98
I will lightly prune the file as
I zip through it.

448
0:20:42,98 --> 0:20:46,08
But they're willing to scan a terabyte
of data to get a few hundred

449
0:20:46,08 --> 0:20:49,24
thousand rows of results, which
is, which is crazy, unless you

450
0:20:49,24 --> 0:20:52,44
happen to have the compute and
storage and cloud infrastructure

451
0:20:52,5 --> 0:20:54,98
to afford to do that, which these
days we do.

452
0:20:55,12 --> 0:20:56,46
It's one of the problems of being
old.

453
0:20:56,46 --> 0:21:0,6
You, your understanding of what
is efficient and correct is receding

454
0:21:0,6 --> 0:21:1,36
in the distance.

455
0:21:1,56 --> 0:21:4,84
Nikolay: But I still have some
doubts and questions how stable

456
0:21:5,02 --> 0:21:7,62
performance for such queries is.

457
0:21:7,64 --> 0:21:10,68
Like it's maybe not fully resolved
problem.

458
0:21:10,68 --> 0:21:11,26
Paul: I agree.

459
0:21:11,26 --> 0:21:14,76
I find it really weird that you
have to reform or resort your

460
0:21:14,76 --> 0:21:16,8
data to match your use case.

461
0:21:17,02 --> 0:21:17,78
And it's nice.

462
0:21:17,78 --> 0:21:19,44
It's great when things work out
great.

463
0:21:19,44 --> 0:21:23,34
So like you're sloughing your data
off the, off the transactional

464
0:21:23,54 --> 0:21:27,34
store, off to the object store
in, usually it's like time series

465
0:21:27,34 --> 0:21:28,38
order, right?

466
0:21:28,58 --> 0:21:30,04
Time of most recent change.

467
0:21:30,3 --> 0:21:33,98
And it so happens that your queries
are either grouping on time

468
0:21:34,0 --> 0:21:37,74
or filtering on time, and then
the underlying format metadata

469
0:21:37,74 --> 0:21:40,76
allows the engines to really throw
away 90% of the data before

470
0:21:40,76 --> 0:21:41,36
it looks at it.

471
0:21:41,36 --> 0:21:42,1
That's great.

472
0:21:42,4 --> 0:21:47,12
I'm used to the power of SQL inside
a real OLTP engine where you

473
0:21:47,12 --> 0:21:49,82
can address any combination of
tables and joins, and you're not

474
0:21:49,82 --> 0:21:53,0
going to be hit too hard with efficiency
as long as your indexes

475
0:21:53,0 --> 0:21:53,8
are in place.

476
0:21:53,92 --> 0:21:57,32
Nikolay: And it sounds reasonable
for columnstore and for purely

477
0:21:57,32 --> 0:22:0,4
analytical queries, but when you,
for example, go back to this

478
0:22:0,4 --> 0:22:4,6
KNN, K-Nearest Neighbors problem,
we just need like 10

479
0:22:4,6 --> 0:22:6,56
Paul: rows, but we

480
0:22:6,56 --> 0:22:8,86
Nikolay: have millions, right,
or billions.

481
0:22:9,32 --> 0:22:9,56
Regina: Yeah.

482
0:22:9,56 --> 0:22:10,74
How does that work?

483
0:22:11,06 --> 0:22:11,88
Does it even work?

484
0:22:11,88 --> 0:22:14,36
Can you even do a KNN in those
systems?

485
0:22:14,96 --> 0:22:21,26
Paul: Ah, you end up making some
kind of compromise with reality.

486
0:22:21,9 --> 0:22:26,68
If you sort your data, like on
a key, like a Hilbert code or

487
0:22:26,68 --> 0:22:31,26
something that has like spatial
co-location to it, then you can

488
0:22:31,26 --> 0:22:34,74
do that kind of scan a little better
because you know where to

489
0:22:34,74 --> 0:22:36,36
hop in the Hilbert code.

490
0:22:36,76 --> 0:22:39,44
Duck for its internal implementation,
I think it's just sucked

491
0:22:39,44 --> 0:22:42,48
it up and put a lightweight index
next to the table.

492
0:22:42,52 --> 0:22:45,8
So they're not actually doing the
full column store scan for

493
0:22:45,8 --> 0:22:47,66
those particular kinds of queries.

494
0:22:48,08 --> 0:22:51,42
Spatial is as database people who
have put a geospatial type

495
0:22:51,42 --> 0:22:54,22
into a standard relational database,
we like to say spatial isn't

496
0:22:54,22 --> 0:22:57,74
special because you just type SQL
and you get your answer back,

497
0:22:57,9 --> 0:23:0,52
but at an implementation level,
there's no getting around it.

498
0:23:0,52 --> 0:23:3,48
Spatial is special in a whole bunch
of ways, because just being

499
0:23:3,48 --> 0:23:5,38
higher dimensional, it's the same
thing people run into with

500
0:23:5,38 --> 0:23:5,86
pgvector.

501
0:23:5,86 --> 0:23:9,44
Once you get beyond that one dimensional,
1 dimension of sort order,

502
0:23:9,64 --> 0:23:12,82
a lot of your preconceptions about
the most efficient way to

503
0:23:12,82 --> 0:23:15,64
deal with the data fall away and
you have to rethink problems.

504
0:23:17,5 --> 0:23:20,5
Michael: one difference seems to
be people aren't as forgiving

505
0:23:20,68 --> 0:23:24,14
of accuracy issues when it comes
to spatial data.

506
0:23:24,14 --> 0:23:27,28
You know, you don't have approximate
nearest neighbor.

507
0:23:27,34 --> 0:23:28,92
Paul: Neil That is 100% true.

508
0:23:28,94 --> 0:23:29,18
Yeah.

509
0:23:29,18 --> 0:23:32,24
Things on maps, People are very
unforgiving with maps.

510
0:23:32,24 --> 0:23:34,92
It's like one mistake, you're
wrong.

511
0:23:34,92 --> 0:23:38,22
Whereas you roll up a BI dashboard
and give some people a pie

512
0:23:38,22 --> 0:23:39,1
chart, whatever.

513
0:23:39,28 --> 0:23:42,92
It's actually one of my great shaking
fist at cloud things about

514
0:23:42,92 --> 0:23:46,3
the big column stores is that the
people insist on summing up

515
0:23:46,3 --> 0:23:49,6
every record of 5 billion records
and make a pie chart.

516
0:23:49,6 --> 0:23:53,18
You could have sampled and got
the exact same pie chart.

517
0:23:53,52 --> 0:23:56,2
Michael: I feel like we've dived
into the most complex type of

518
0:23:56,2 --> 0:23:59,54
performance issues, but I think
a lot of people, like I've come

519
0:23:59,54 --> 0:24:2,12
across some performance issues
when it comes to PostGIS work,

520
0:24:2,12 --> 0:24:4,8
a few customers that have the odd
specific case.

521
0:24:4,9 --> 0:24:8,04
And I think even it's like the
basics that they have questions

522
0:24:8,04 --> 0:24:12,32
around like geometry versus geography
or GiST versus SP-GiST.

523
0:24:12,36 --> 0:24:16,12
And any advice either of you have
for kind of people that are

524
0:24:16,12 --> 0:24:20,14
new to this stuff on the basics
of on the simpler queries, how

525
0:24:20,14 --> 0:24:21,76
should they be thinking about these
things?

526
0:24:21,76 --> 0:24:25,28
Nikolay: IMB The absolute basic
is how R-tree is implemented and

527
0:24:25,28 --> 0:24:27,6
what R-tree is compared to B-tree.

528
0:24:27,8 --> 0:24:30,7
This is like the gem of Postgres
actually, right?

529
0:24:32,26 --> 0:24:35,82
Maybe let's talk a little bit about
this, like some basics in

530
0:24:35,82 --> 0:24:36,96
the middle of it.

531
0:24:37,42 --> 0:24:41,04
Regina: I think that bores most
users, the discussion.

532
0:24:41,04 --> 0:24:44,18
They just want to say that I can
write this and I get this.

533
0:24:45,06 --> 0:24:47,42
And they don't care about the underlying
implementation.

534
0:24:47,56 --> 0:24:48,0
Paul: You think

535
0:24:48,0 --> 0:24:48,5
Nikolay: so?

536
0:24:48,94 --> 0:24:50,96
There are different users, I don't
know.

537
0:24:51,82 --> 0:24:52,06
Maybe.

538
0:24:52,06 --> 0:24:56,9
But okay, let's mention that Postgres
has this GiST, a generalized

539
0:24:57,32 --> 0:25:2,64
search tree, and R-tree is implemented
using it, not separately.

540
0:25:2,8 --> 0:25:3,3
Correct.

541
0:25:3,8 --> 0:25:7,78
And yeah, there is a great work
by Hellerstein from Berkeley

542
0:25:7,8 --> 0:25:9,4
and from the late 90s.

543
0:25:10,08 --> 0:25:14,68
And it's like basically generalization
of B-tree idea to multidimensional

544
0:25:15,02 --> 0:25:17,52
space, two-dimensional, three-dimensional,
n-dimensional.

545
0:25:17,7 --> 0:25:20,62
Paul: Well, n-dimensional is defined
more or less by the API,

546
0:25:20,66 --> 0:25:20,92
right?

547
0:25:20,92 --> 0:25:23,86
Not so much by the space and whatever
space you're in, as long

548
0:25:23,86 --> 0:25:27,02
as you can meet the conditions
of the API.

549
0:25:27,66 --> 0:25:28,78
Nikolay: 7 functions, right?

550
0:25:28,78 --> 0:25:29,06
Paul: Yeah.

551
0:25:29,06 --> 0:25:33,04
Given a key, can you say whether
things are consistent or not

552
0:25:33,04 --> 0:25:35,54
with that key, that's one piece of
the API.

553
0:25:35,54 --> 0:25:39,64
Another is given a collection of
keys, can you split that collection?

554
0:25:40,84 --> 0:25:43,54
And that depending on your dimensional
space, those splits will

555
0:25:43,54 --> 0:25:46,12
be calculated differently, but
it's just like, given a set, can

556
0:25:46,12 --> 0:25:47,36
you do a split for me?

557
0:25:47,36 --> 0:25:47,72
Yes.

558
0:25:47,72 --> 0:25:48,04
Okay.

559
0:25:48,04 --> 0:25:49,26
Can you do key consistency?

560
0:25:49,28 --> 0:25:51,72
I'm trying to think what the other,
what the other API pieces,

561
0:25:51,72 --> 0:25:55,76
I feel like there's a third, but
regardless you can, with the

562
0:25:55,76 --> 0:26:0,26
rules of the GiST API, you can
implement an R-tree, like The

563
0:26:0,26 --> 0:26:4,4
kinds of questions that an R-tree
answers, the kinds of guarantees

564
0:26:4,4 --> 0:26:7,8
an R-tree gives are ones that fit
inside the generalized API

565
0:26:7,8 --> 0:26:8,4
of GiST.

566
0:26:8,4 --> 0:26:11,26
So it means that you don't need
a separate access method for

567
0:26:11,26 --> 0:26:11,76
R-tree.

568
0:26:12,04 --> 0:26:14,76
You don't actually need a separate
access method for B-tree either.

569
0:26:14,76 --> 0:26:17,98
In fact, there's a B-tree GiST
implementations in contrib that

570
0:26:17,98 --> 0:26:19,82
you can look at.

571
0:26:19,98 --> 0:26:24,3
Because it's more generic, There's
less opportunities for performance

572
0:26:24,4 --> 0:26:27,88
optimizations, but the beauty of
being able to, for almost any

573
0:26:27,88 --> 0:26:30,32
type, cause the just conditions
are really, really loose and

574
0:26:30,32 --> 0:26:33,18
generic, almost any type you can
get an access method without

575
0:26:33,18 --> 0:26:37,24
having to think about all the problems
of pages and consistency

576
0:26:37,24 --> 0:26:37,72
and so on.

577
0:26:37,72 --> 0:26:40,82
That's one of the things that has
made, say, a tool like pgvector

578
0:26:41,12 --> 0:26:44,44
a complex thing to approach and
that some of the only someone

579
0:26:44,44 --> 0:26:47,62
with like serious computer science
chops would want to attack

580
0:26:47,8 --> 0:26:50,86
is that they had to write their
own access method right down

581
0:26:50,86 --> 0:26:51,38
to disk.

582
0:26:51,38 --> 0:26:54,38
They did not get to just layer
their access method on top of

583
0:26:54,38 --> 0:26:56,76
an existing implementation like
GiST.

584
0:26:56,76 --> 0:26:59,06
They had to write it all the way
down to all the way down to

585
0:26:59,06 --> 0:27:1,92
the disk and page level, which
is a lot harder to do if you're

586
0:27:1,92 --> 0:27:3,84
going to retain guarantees of consistency.

587
0:27:4,82 --> 0:27:5,14
Nikolay: Right.

588
0:27:5,14 --> 0:27:7,94
So you mentioned, I guess you mentioned
B-tree GiST, it's called,

589
0:27:7,94 --> 0:27:9,02
or GiST B-tree, I always

590
0:27:9,02 --> 0:27:9,64
Paul: confuse it.

591
0:27:9,64 --> 0:27:10,6
Yeah, B2 GiST.

592
0:27:10,6 --> 0:27:13,74
Nikolay: Yeah, and it lost comparison
to native implementation

593
0:27:13,82 --> 0:27:18,02
of B-tree, while R-tree didn't lose, like
it won.

594
0:27:18,04 --> 0:27:19,02
And just...

595
0:27:19,2 --> 0:27:19,54
Paul: Yeah.

596
0:27:19,54 --> 0:27:22,5
And I'm not sure if that's just,
probably that's more a matter

597
0:27:22,5 --> 0:27:25,82
of people not wanting to spend
the time to work on the R-tree implementation.

598
0:27:25,92 --> 0:27:29,72
Like it was considered, it was
considered not a good implementation

599
0:27:30,14 --> 0:27:31,4
at the time that we started.

600
0:27:31,4 --> 0:27:34,96
It still existed when we started
with PostGIS, but it was more

601
0:27:34,96 --> 0:27:38,72
tightly bound to the spatial types,
the native spatial types

602
0:27:38,72 --> 0:27:39,44
in Postgres.

603
0:27:40,92 --> 0:27:42,54
And so we didn't use it.

604
0:27:42,54 --> 0:27:43,48
No one else used it.

605
0:27:43,48 --> 0:27:45,04
Eventually it was ripped out.

606
0:27:45,16 --> 0:27:49,9
It had, I think all the way up
to the time it got taken out,

607
0:27:50,82 --> 0:27:52,16
an object size limitation.

608
0:27:53,48 --> 0:27:56,54
So it couldn't handle objects larger
than a page size.

609
0:27:57,9 --> 0:28:0,78
That was, I think, the death knell,
really.

610
0:28:0,92 --> 0:28:4,74
Because practical work with spatial
data inevitably involves

611
0:28:4,74 --> 0:28:7,28
dealing with objects that exceed
the page size.

612
0:28:7,72 --> 0:28:11,0
Nikolay: So I want, what I wanted
to just like some counter argument

613
0:28:11,0 --> 0:28:14,12
why it matters to understand things
like slightly more than just

614
0:28:14,12 --> 0:28:14,7
let's use it.

615
0:28:14,7 --> 0:28:15,24
And that's it.

616
0:28:15,24 --> 0:28:17,94
Because for example, in this particular
case, B-tree is like

617
0:28:17,94 --> 0:28:21,04
less, less than greater than equals.

618
0:28:21,54 --> 0:28:24,84
R-tree is like is contained, contains,
right?

619
0:28:24,84 --> 0:28:25,34
Overlaps.

620
0:28:26,28 --> 0:28:30,76
You can, if you use this Btree
GiST extension, you can combine

621
0:28:30,8 --> 0:28:34,78
and have multi-column index of
very different data types, right?

622
0:28:34,96 --> 0:28:35,46
And

623
0:28:36,04 --> 0:28:37,58
Regina: I use that a lot too.

624
0:28:37,64 --> 0:28:37,84
Nikolay: Yeah.

625
0:28:37,84 --> 0:28:40,22
And this is great performance trick.

626
0:28:40,24 --> 0:28:44,76
And if you know like why it's useful,
it can save you because

627
0:28:44,76 --> 0:28:46,12
it's just a single index scan.

628
0:28:46,12 --> 0:28:46,56
That's it.

629
0:28:46,56 --> 0:28:47,98
So you use it a lot.

630
0:28:48,08 --> 0:28:51,6
Regina: Well, if you have things
where there's a lot of Attribute

631
0:28:51,82 --> 0:28:54,34
things that you have to filter
against, but you also need to

632
0:28:54,34 --> 0:28:59,04
do a spatial at the same time,
especially with the way the sensitivity

633
0:29:0,24 --> 0:29:3,9
that Postgres has, it sometimes
picks the wrong plan, whereas

634
0:29:4,3 --> 0:29:7,54
when you have it together, it can't
screw up.

635
0:29:8,1 --> 0:29:12,28
So there's some cases where I've
had where if I had just a GiST

636
0:29:12,28 --> 0:29:16,84
index and a B-tree separate, the
planner would do something really

637
0:29:16,84 --> 0:29:22,7
stupid and try to use the spatial
and not use it the B-tree

638
0:29:22,72 --> 0:29:27,54
at all and so the performance you'd
go from minutes to milliseconds

639
0:29:28,02 --> 0:29:29,04
in those cases.

640
0:29:30,16 --> 0:29:32,88
Nikolay: Yeah yeah so I had some
experience not once yeah.

641
0:29:33,92 --> 0:29:37,68
Michael: Same And I also thought
like Postgres would be able

642
0:29:37,68 --> 0:29:39,16
to do a bitmap scan.

643
0:29:39,16 --> 0:29:42,9
I thought it would be able to use
the GiST index and the B-tree

644
0:29:42,9 --> 0:29:43,4
separately.

645
0:29:43,44 --> 0:29:46,46
I know it wouldn't be as efficient
still, but it seemed to be

646
0:29:46,46 --> 0:29:47,72
refusing to do so for me.

647
0:29:47,72 --> 0:29:50,1
So I also wonder if there's something
missing on the Postgres

648
0:29:50,22 --> 0:29:54,22
side to consider them for bitmap
scans as well, multiple indexes

649
0:29:54,28 --> 0:29:55,38
with an and condition.

650
0:29:56,04 --> 0:29:58,44
I thought it would consider it
but I really struggled to get

651
0:29:58,44 --> 0:29:59,12
it to.

652
0:29:59,18 --> 0:30:2,72
Nikolay: I also can complain You take pgvector and you cannot

653
0:30:2,72 --> 0:30:6,42
combine it with GiST basically.

654
0:30:6,82 --> 0:30:8,28
Paul: Yeah, that's super hard.

655
0:30:8,52 --> 0:30:11,64
Given just the use case for pgvector, like the find me the nearest

656
0:30:11,64 --> 0:30:15,06
thing is the core use case, and that is like really does not

657
0:30:15,06 --> 0:30:16,86
fit nicely into the executor.

658
0:30:17,78 --> 0:30:20,04
Nikolay: one case works really fast and the other doesn't.

659
0:30:21,22 --> 0:30:22,4
Michael: What about for beginners?

660
0:30:22,48 --> 0:30:24,48
Regina, you've written some books on this.

661
0:30:24,48 --> 0:30:26,84
How do you describe performance stuff for beginners, like in

662
0:30:26,84 --> 0:30:30,04
terms of practical tips, or what do you tend to see mistakes

663
0:30:30,08 --> 0:30:30,88
people make?

664
0:30:31,68 --> 0:30:35,74
Regina: I think the classic one is people try to do distance by

665
0:30:35,74 --> 0:30:38,36
doing using ST_Distance.

666
0:30:38,62 --> 0:30:45,04
So they say ST_Distance this less than 50 instead of using ST_DWithin.

667
0:30:45,64 --> 0:30:48,06
So like ST_Distance can't use an index.

668
0:30:48,84 --> 0:30:53,3
So you know their queries might take minutes and then when they

669
0:30:53,3 --> 0:30:58,04
switch it to ST_DWithin and finishes in milliseconds, that kind

670
0:30:58,04 --> 0:31:0,7
of thing because it would have to scan the whole all the records

671
0:31:0,7 --> 0:31:2,2
if you do ST_Distance.

672
0:31:2,54 --> 0:31:4,98
That's a classical mistake that people make.

673
0:31:5,16 --> 0:31:8,1
Nikolay: I'm curious, does AI make this mistake or not?

674
0:31:9,4 --> 0:31:12,44
Regina: And what's annoying is like in SQL Server, it's different.

675
0:31:12,44 --> 0:31:16,9
You can actually use ST_Distance and it uses an index.

676
0:31:17,18 --> 0:31:18,76
Paul: So what are they doing in SQL Server?

677
0:31:18,76 --> 0:31:20,34
They must be rewriting the query.

678
0:31:21,14 --> 0:31:22,9
Regina: Yeah, they must be rewriting the query.

679
0:31:22,9 --> 0:31:27,72
So I was surprised that, oh yeah, you can do distance less than

680
0:31:27,72 --> 0:31:29,34
50 and it will do the right thing.

681
0:31:29,34 --> 0:31:32,46
It will do it pretty much what RS_ST_DWithin does.

682
0:31:32,7 --> 0:31:33,54
Paul: That's interesting.

683
0:31:34,02 --> 0:31:36,5
That, yeah, well query rewriting is kind of deep.

684
0:31:36,5 --> 0:31:40,6
But the fact that they do that is interesting just in and of

685
0:31:40,6 --> 0:31:41,1
itself.

686
0:31:41,32 --> 0:31:46,78
It speaks to them, well either like presupposing a problem their

687
0:31:46,78 --> 0:31:50,08
users are going to have or something their users really did have

688
0:31:50,08 --> 0:31:52,84
constantly so it's worth it for them to go to the extra work

689
0:31:52,84 --> 0:31:54,98
of doing a query rewrite for that case.

690
0:31:55,76 --> 0:31:59,48
Regina: I noticed that Postgres does do query rewrites too, right?

691
0:31:59,48 --> 0:32:0,04
Paul: It does.

692
0:32:0,04 --> 0:32:1,16
There's an affordance there.

693
0:32:1,16 --> 0:32:3,04
We could muck with the plan if we wanted to.

694
0:32:3,04 --> 0:32:6,18
Regina: Yeah, if you do in, they change it to any, right?

695
0:32:6,18 --> 0:32:6,6
Paul: Yeah.

696
0:32:6,6 --> 0:32:7,1
Yeah.

697
0:32:7,2 --> 0:32:10,02
Regina: So in theory, we could do a rewrite somewhere.

698
0:32:10,02 --> 0:32:11,66
It's just, where would that go?

699
0:32:11,68 --> 0:32:11,98
Paul: Yeah.

700
0:32:11,98 --> 0:32:14,8
I mean, that's like adding a certain amount of brittleness to

701
0:32:14,8 --> 0:32:16,8
the system where you don't know
that the users actually need

702
0:32:16,8 --> 0:32:16,96
it.

703
0:32:16,96 --> 0:32:19,54
Anyways, your example presupposes
that they have a spatial index

704
0:32:19,54 --> 0:32:22,26
in the first place, which is what
I would have raised as like

705
0:32:22,26 --> 0:32:25,74
the number 1, the number one new
user mistake.

706
0:32:25,76 --> 0:32:26,46
Regina: Oh, that's true.

707
0:32:26,46 --> 0:32:26,66
Yeah.

708
0:32:26,66 --> 0:32:27,72
That's the other thing.

709
0:32:27,72 --> 0:32:31,72
Paul: It's quite an easy mistake
to make because, by and large,

710
0:32:32,02 --> 0:32:35,92
yeah, by and large, people don't
think about index as indexes

711
0:32:35,92 --> 0:32:39,72
as having a type, like if they
come from a SQL server world or

712
0:32:39,72 --> 0:32:42,6
whatever, like the idea that this,
they need this extra keyword,

713
0:32:42,6 --> 0:32:46,64
the using just keyword to get a
spatial index, not going to be

714
0:32:46,64 --> 0:32:48,98
super obvious as they're just coming
from some other database.

715
0:32:49,06 --> 0:32:54,14
And if you type create index blah
on geometry column, it'll do

716
0:32:54,14 --> 0:32:57,1
something like it won't error out
because we've had to do lightweight

717
0:32:57,1 --> 0:33:1,18
B-tree bindings in order to get
aspects of sortability from our

718
0:33:1,18 --> 0:33:1,68
type.

719
0:33:1,78 --> 0:33:4,34
So it's not like it'll just say,
oh, I don't have a B-2B binding.

720
0:33:4,34 --> 0:33:7,72
It'll go and build a really terrible
index that's no good for

721
0:33:7,72 --> 0:33:9,26
you and quietly return.

722
0:33:9,84 --> 0:33:12,1
So I think that's one of the biggest
gotchas.

723
0:33:12,1 --> 0:33:14,36
It's just like having a proper
spatial index.

724
0:33:15,18 --> 0:33:18,12
Regina: Yes, knowing that you need
an index and knowing that

725
0:33:18,12 --> 0:33:19,98
you need it to be a GiST index.

726
0:33:20,32 --> 0:33:20,82
Paul: Yeah.

727
0:33:21,6 --> 0:33:25,12
Regina: As far as between GiST
and SP-GiST, I'd still tend to

728
0:33:25,12 --> 0:33:28,1
go toward GiST because I know we
put in more effort in GiST than

729
0:33:28,1 --> 0:33:29,18
we did in SP-GiST.

730
0:33:29,54 --> 0:33:30,04
Same.

731
0:33:30,04 --> 0:33:33,02
I think our GiST support is more
robust than SP-GiST.

732
0:33:33,36 --> 0:33:33,68
Paul: Yeah.

733
0:33:33,68 --> 0:33:37,12
I have tried different benchmarks
and come up with different

734
0:33:37,12 --> 0:33:39,08
answers from different data, different
queries.

735
0:33:39,08 --> 0:33:42,16
There's not, I've not been able
to like fully characterize the

736
0:33:42,16 --> 0:33:44,44
kinds of places where SP-GiST outperforms.

737
0:33:44,49 --> 0:33:47,12
I don't know, Nik, you, you seemed
like really excited about

738
0:33:47,12 --> 0:33:50,34
SP-GiST or like there are places
you've found where like SP-GiST

739
0:33:50,34 --> 0:33:51,36
is the clear winner.

740
0:33:52,54 --> 0:33:56,74
Nikolay: No, somehow I just recently
don't see this kind of workloads

741
0:33:56,76 --> 0:33:59,7
at all, like maybe a couple of
years, so I don't have fresh data

742
0:33:59,7 --> 0:34:0,32
at all.

743
0:34:0,32 --> 0:34:3,34
I don't know like why, obviously,
like it's just a matter of

744
0:34:3,34 --> 0:34:3,84
luck.

745
0:34:5,02 --> 0:34:8,14
Michael: I came across one recently
where they were using a GiST

746
0:34:8,14 --> 0:34:11,38
index and still struggling with
performance and I suggested trying

747
0:34:11,38 --> 0:34:15,24
SP-GiST because I thought it might
make sense and it didn't help.

748
0:34:15,24 --> 0:34:17,88
So it wasn't much worse, it was
very similar.

749
0:34:18,06 --> 0:34:21,06
But the one thing that they changed
that did help a lot was they

750
0:34:21,06 --> 0:34:25,04
were using geography before, and
it was only to calculate quite

751
0:34:25,04 --> 0:34:29,18
small distances, like within 25 meters type thing and switching

752
0:34:29,18 --> 0:34:31,16
to geometry made a big difference.

753
0:34:32,68 --> 0:34:32,86
Yeah.

754
0:34:32,86 --> 0:34:35,74
That was surprised by, Yeah, you 2 are not surprised.

755
0:34:35,74 --> 0:34:36,82
You 2 are nodding along.

756
0:34:36,82 --> 0:34:37,6602
Paul: I'm not surprised at all.

757
0:34:37,6602 --> 0:34:40,14
Regina: Because geography is a more complex, it's more, it's

758
0:34:40,14 --> 0:34:44,48
three-dimensional, whereas the geometry is two-dimensional, really.

759
0:34:45,24 --> 0:34:48,16
Paul: I have a slide in my sort of standard Postgres talk where

760
0:34:48,16 --> 0:34:50,46
I say, you know, geometry or geography, what should I do?

761
0:34:50,46 --> 0:34:53,8
And the thing that I use to demonstrate why geometry is going

762
0:34:53,8 --> 0:34:57,02
to be better for performance purposes is just the distance calculation.

763
0:34:57,88 --> 0:34:59,36
Everyone has learned Pythagoras.

764
0:34:59,38 --> 0:35:1,56
So I know what the distance calculations looks like there, right?

765
0:35:1,56 --> 0:35:3,06
2 squares and a square root.

766
0:35:3,44 --> 0:35:6,36
Any distance calculation in geography space is going to have

767
0:35:6,36 --> 0:35:9,4
to do a bunch of Haversine calculations, which is the distance

768
0:35:9,4 --> 0:35:10,18
on a sphere.

769
0:35:10,32 --> 0:35:14,28
And Haversine has 5 transcendentals in it, plus the square root.

770
0:35:14,28 --> 0:35:17,7
So it's just way more, like orders of magnitude more computationally

771
0:35:17,8 --> 0:35:18,3
intensive.

772
0:35:18,5 --> 0:35:21,26
So if you're doing a bunch of those calculations, you're just

773
0:35:21,26 --> 0:35:23,2
going to pay a big price for sure.

774
0:35:23,48 --> 0:35:24,9
Nikolay: The earth is flat.

775
0:35:25,26 --> 0:35:26,2
The earth is flat.

776
0:35:26,2 --> 0:35:27,08
I knew it.

777
0:35:27,7 --> 0:35:30,06
Paul: Peter Ideal, in an ideal world, the office is flat.

778
0:35:30,06 --> 0:35:30,56
Yes.

779
0:35:30,92 --> 0:35:35,0
So that means you need to commit to a planar understanding of

780
0:35:35,0 --> 0:35:36,58
your work area if you can.

781
0:35:36,74 --> 0:35:39,84
So if you are doing an app which works in London and say, you

782
0:35:39,84 --> 0:35:42,52
know, there's a planar projection which is good for London.

783
0:35:42,52 --> 0:35:44,64
If you're doing something in Idaho, there's a planar projection

784
0:35:44,64 --> 0:35:45,3599
which is good for Idaho.

785
0:35:45,3599 --> 0:35:47,52
If You're doing something which is like the continent of the

786
0:35:47,52 --> 0:35:47,72
U.S.

787
0:35:47,72 --> 0:35:50,22
There is a planar projection, which is good for most people's

788
0:35:50,22 --> 0:35:51,6
purposes in the continent of the U.S.

789
0:35:51,6 --> 0:35:54,76
It's very rare that they have something which requires latitude

790
0:35:54,76 --> 0:35:55,26
and longitude.

791
0:35:55,26 --> 0:35:58,34
You only get the breakdowns when you're starting to work with

792
0:35:58,34 --> 0:35:59,74
truly global data.

793
0:36:0,06 --> 0:36:3,22
Places where you go over the poles, places where you have data

794
0:36:3,22 --> 0:36:6,76
in the Northwest territories and also in South America, where

795
0:36:7,08 --> 0:36:10,42
the scale distortions of something like Mercator cannot be ignored,

796
0:36:10,8 --> 0:36:13,64
that's when you say, okay, fine, I'll use geography.

797
0:36:13,66 --> 0:36:16,0
And then you'll get some nice advantages out of geography.

798
0:36:16,0 --> 0:36:18,82
If you have truly global data, the fact that the shortest path

799
0:36:18,82 --> 0:36:21,66
might go over the North Pole, that's a useful thing to have.

800
0:36:21,66 --> 0:36:22,96
Or it would go across the dateline.

801
0:36:22,96 --> 0:36:26,26
That's a useful thing for the system to just transparently handle,

802
0:36:26,26 --> 0:36:29,06
which it does for both the calculations of things like distance

803
0:36:29,06 --> 0:36:32,06
and area, And for the performance-based stuff like indexing,

804
0:36:32,08 --> 0:36:33,0
like it does not care.

805
0:36:33,0 --> 0:36:33,84
It goes over the poles.

806
0:36:33,84 --> 0:36:34,92
It goes around the dateline.

807
0:36:34,92 --> 0:36:37,8
It doesn't notice because the model just, it takes away those

808
0:36:37,8 --> 0:36:39,9
singularity points when you're working on a sphere.

809
0:36:39,96 --> 0:36:40,32
Michael: Nice.

810
0:36:40,32 --> 0:36:43,26
The, I think the reason I was surprised was more because I'm

811
0:36:43,26 --> 0:36:48,68
so used to queries being IO bound and it's, it's quite fast to

812
0:36:48,68 --> 0:36:50,28
get one that's actually CPU.

813
0:36:50,28 --> 0:36:53,42
You know, actually the CPU do matter here.

814
0:36:53,42 --> 0:36:53,64
Yeah.

815
0:36:53,64 --> 0:36:57,14
Paul: That is one of the places where spatial is special and people

816
0:36:57,16 --> 0:36:57,98
don't notice it.

817
0:36:57,98 --> 0:36:58,98
And it's a weird 1.

818
0:36:58,98 --> 0:37:2,86
Like we for, trying to integrate with a Postgres planner, we've

819
0:37:2,86 --> 0:37:8,08
been trying to help the planner get smarter by costing our functions

820
0:37:8,08 --> 0:37:8,58
appropriately.

821
0:37:9,24 --> 0:37:13,54
But the planner only looks at function costs in filters and joins.

822
0:37:14,34 --> 0:37:16,72
A lot of the times People are doing the spatial calculations

823
0:37:17,62 --> 0:37:20,18
in the SELECT line, right?

824
0:37:20,58 --> 0:37:21,64
The return values.

825
0:37:21,66 --> 0:37:25,08
So if you have a complex function in your result set, Postgres

826
0:37:25,08 --> 0:37:28,26
is not going to add that in and say, oh, I should do a different

827
0:37:28,26 --> 0:37:30,36
thing, or that there's a lot of cost here.

828
0:37:30,48 --> 0:37:31,56
So that's an issue.

829
0:37:31,56 --> 0:37:34,28
And then actually it's a real problem on IO as well.

830
0:37:34,28 --> 0:37:38,1
I actually did a blog post for Snowflake about 2 weeks ago that

831
0:37:38,1 --> 0:37:41,74
went public, talking about that with respect to JSONB, but I

832
0:37:41,74 --> 0:37:44,68
learned about it in the geography, geometry world, which is,

833
0:37:44,68 --> 0:37:49,65
you know, you get a big object and Canada, Canada has like

834
0:37:49,65 --> 0:37:53,68
10,000 vertices, like as well, as well above the, the maximum object

835
0:37:53,68 --> 0:37:56,1
size that you, that you'll find in a, in a page.

836
0:37:56,12 --> 0:37:57,38
So it's going to be TOASTed.

837
0:37:57,44 --> 0:38:0,1
And that means any retrieval of that object is going to be a

838
0:38:0,1 --> 0:38:3,52
2 stepper where it has to go and find the TOAST pointer and then

839
0:38:3,52 --> 0:38:5,92
go into the TOAST tables and gather all the parts and put them

840
0:38:5,92 --> 0:38:8,86
all together in order and then decompress all that, like just

841
0:38:8,86 --> 0:38:10,9
to get it out, just in the IO.

842
0:38:11,0 --> 0:38:14,06
And that's like a combination of the IO, pull that across, but

843
0:38:14,06 --> 0:38:17,48
also the computational overhead of just reassembling the thing,

844
0:38:17,52 --> 0:38:19,12
it's about 10 times, 10X.

845
0:38:19,62 --> 0:38:22,48
I found in joins, You can get things to go 10X faster when I

846
0:38:22,48 --> 0:38:24,26
don't have that TOAST overhead.

847
0:38:24,64 --> 0:38:27,52
If I pre-cut everything into shapes which are smaller than the

848
0:38:27,52 --> 0:38:28,68
object to the max.

849
0:38:29,76 --> 0:38:34,7
Regina: Yeah, Speaking of TOAST, there's also the one gigabyte.

850
0:38:34,76 --> 0:38:36,3
Is that the limit on TOAST?

851
0:38:36,4 --> 0:38:40,08
If people have already started
complaining about that, about

852
0:38:40,08 --> 0:38:40,58
that.

853
0:38:41,72 --> 0:38:42,34
Paul: It's or

854
0:38:42,34 --> 0:38:44,24
Nikolay: what are the types of
Arduino types?

855
0:38:44,24 --> 0:38:45,78
So it's one gigabyte, I think.

856
0:38:45,78 --> 0:38:46,28
Yeah.

857
0:38:46,36 --> 0:38:46,78
Yeah.

858
0:38:46,78 --> 0:38:50,42
But it's better not to go beyond
300 megabytes.

859
0:38:50,54 --> 0:38:53,94
I think there are benchmarks showing
that for JSON is terrible.

860
0:38:54,52 --> 0:38:55,7
Paul: Oh, I can't imagine.

861
0:38:56,06 --> 0:38:56,56
Nikolay: Yeah.

862
0:38:57,44 --> 0:39:1,32
There is a drop in performance
after 300 megabytes or so.

863
0:39:2,42 --> 0:39:4,34
Michael: That's a lot of JSON though.

864
0:39:5,14 --> 0:39:7,3
Regina: Yeah, I think I forget
what the reason was.

865
0:39:7,3 --> 0:39:9,66
It had more to do with outputting
stuff.

866
0:39:9,84 --> 0:39:10,58
Nikolay: I also forgot.

867
0:39:10,58 --> 0:39:14,94
Regina: Like if you're trying to
output a big NVT file.

868
0:39:15,74 --> 0:39:17,06
Paul: The whole world in one file?

869
0:39:17,06 --> 0:39:18,3
The level 0 file?

870
0:39:18,38 --> 0:39:19,74
Michael: Does it compress well?

871
0:39:19,74 --> 0:39:23,3
I was just thinking, as JSON generally,
there's a lot of repetition,

872
0:39:23,42 --> 0:39:27,4
a lot of like 300 megabytes post-compression
JSON file would

873
0:39:27,4 --> 0:39:28,12
be like

874
0:39:28,78 --> 0:39:29,56
Paul: pretty massive.

875
0:39:29,59 --> 0:39:34,1
This is one of the places where there
are potential implementation

876
0:39:34,24 --> 0:39:37,12
paths which would have given us
a much better I.O.

877
0:39:37,12 --> 0:39:41,82
Profile, but at the cost of something
which the users considered

878
0:39:41,82 --> 0:39:46,18
pretty inviolate, and that's perfect
fidelity of input and output.

879
0:39:47,04 --> 0:39:52,42
So users are feeding the system
with geometries which are described

880
0:39:52,42 --> 0:39:55,3
in terms of coordinates which are
built on double precision.

881
0:39:56,18 --> 0:39:58,94
And if you want to return what
they got, you have to store the

882
0:39:58,94 --> 0:39:59,68
double precision.

883
0:40:0,18 --> 0:40:2,92
And double precision does not compress
so great.

884
0:40:3,3 --> 0:40:6,18
So in answer to your compression
question, yeah, not so great.

885
0:40:6,68 --> 0:40:11,68
If you just take the smallest swing
at saying, this stuff is

886
0:40:11,68 --> 0:40:17,22
quantized or like, this has an
effective minimum precision, like

887
0:40:17,22 --> 0:40:20,18
your road segment is not more precise
than a centimeter.

888
0:40:20,38 --> 0:40:23,36
And if I take advantage of that
fact, if I know that there is

889
0:40:23,36 --> 0:40:27,18
a quantization available, I can
squash stuff down into a delta

890
0:40:27,18 --> 0:40:30,2
encoding, which is incredibly cheap
for spatial data, because

891
0:40:30,2 --> 0:40:33,46
spatial data tends to be highly
auto-correlated in space.

892
0:40:34,0 --> 0:40:37,0
So you can store the differences
between each coordinate with

893
0:40:37,0 --> 0:40:39,34
a very small number, which is great.

894
0:40:39,72 --> 0:40:42,42
It's a really good compression
and it's a very fast decompression.

895
0:40:43,02 --> 0:40:47,44
But you have to give up precision
to take advantage of it.

896
0:40:47,62 --> 0:40:51,58
Nikolay: Or if it's a lot of moving
objects on the map, but this

897
0:40:51,58 --> 0:40:54,9
is should be column compression,
like TimescaleDB does.

898
0:40:54,9 --> 0:40:57,74
So we just change a little bit
position.

899
0:40:57,74 --> 0:40:59,92
We cannot jump to different continents
immediately.

900
0:40:59,92 --> 0:41:0,78
Paul: Yeah, but

901
0:41:0,78 --> 0:41:2,68
there will have to be precision
associated with that.

902
0:41:2,68 --> 0:41:6,86
Like they will be losing a little
bit of precision when they

903
0:41:7,3 --> 0:41:7,8
compress.

904
0:41:7,8 --> 0:41:12,04
Like you can't just slam floats
or slam doubles together cleanly

905
0:41:12,04 --> 0:41:15,42
all the time and get a nice clean
delta encoding.

906
0:41:15,48 --> 0:41:17,22
Sometimes you can, but not always.

907
0:41:17,86 --> 0:41:19,74
Nikolay: Similar with timestamps,
actually.

908
0:41:19,74 --> 0:41:25,28
If you want to give up milliseconds,
you can have fewer bytes,

909
0:41:25,28 --> 0:41:25,78
definitely.

910
0:41:26,14 --> 0:41:30,74
Speaking of numbers, I'm curious
what are the biggest PostGIS

911
0:41:31,64 --> 0:41:33,04
setups you saw?

912
0:41:34,36 --> 0:41:39,02
Both single node or maybe sharded
systems like with Citus or

913
0:41:39,02 --> 0:41:39,52
something.

914
0:41:39,72 --> 0:41:41,34
Paul: Regina, you're
the implementer.

915
0:41:41,76 --> 0:41:42,68
What have you seen?

916
0:41:43,26 --> 0:41:44,4
Regina: I have no idea.

917
0:41:45,44 --> 0:41:47,98
I don't work with that big of data
sets.

918
0:41:48,58 --> 0:41:49,08
Paul: Yeah.

919
0:41:49,44 --> 0:41:54,02
So Postgres is, or PostGIS is weird,
a weird number of the Postgres

920
0:41:54,02 --> 0:41:58,82
community because we have, I think
almost certainly the largest

921
0:41:58,82 --> 0:42:0,52
number of deployed clusters.

922
0:42:1,56 --> 0:42:3,56
Like the number of PostGIS users
is huge.

923
0:42:3,56 --> 0:42:7,54
I remember Stephen Frost came to
his first Phosphor G and his

924
0:42:7,54 --> 0:42:11,32
jaw sort of hit the floor because
he was used to going to a Postgres

925
0:42:11,32 --> 0:42:14,38
conference where there'd be like
200 attendees and he came to

926
0:42:14,38 --> 0:42:16,86
1 of the free and open source for
geospatial conferences.

927
0:42:16,92 --> 0:42:18,26
And there are 800 attendees.

928
0:42:19,3 --> 0:42:22,26
And every one of them he talked to
said, yeah, yeah, we use PostGIS.

929
0:42:22,44 --> 0:42:26,2
It was like he'd never seen so
many distinct Postgres users in

930
0:42:26,2 --> 0:42:26,88
1 place.

931
0:42:27,26 --> 0:42:30,24
That's one difference, the population
of PostGIS users is huge

932
0:42:30,24 --> 0:42:33,76
relative to the population of Postgres
users, but they all have

933
0:42:33,76 --> 0:42:35,42
very small deployments.

934
0:42:36,22 --> 0:42:39,16
So yeah, if you're, if you're a
county, you've got several dozen

935
0:42:39,16 --> 0:42:42,1
tables, none of which has more
than 100,000 records, like from

936
0:42:42,1 --> 0:42:45,66
the point of view of scale, it
barely registers, But there are

937
0:42:45,66 --> 0:42:46,88
a few big ones.

938
0:42:46,88 --> 0:42:50,08
It's just that you don't necessarily
know they exist because

939
0:42:50,08 --> 0:42:51,7
no one talks about them.

940
0:42:52,4 --> 0:42:57,26
Like pretty sure Apple Maps production
system is all on Postgres.

941
0:42:57,32 --> 0:42:59,8
That's only because someone whispered
to me in a dark alley,

942
0:42:59,8 --> 0:43:1,5
like no one's ever made it public.

943
0:43:2,08 --> 0:43:6,56
I know that here, which is a really
large geospatial data aggregator.

944
0:43:6,72 --> 0:43:9,72
If you're going to get raw geospatial
data and want to have a

945
0:43:9,72 --> 0:43:14,3
map, which like competes with what
Apple or Google provides,

946
0:43:14,3 --> 0:43:16,44
you probably go to here and license
their data.

947
0:43:16,44 --> 0:43:19,66
I know that their production pipeline
and a lot of their APIs

948
0:43:19,76 --> 0:43:21,3
are built on PostGIS.

949
0:43:21,96 --> 0:43:25,46
So it's a really big user, global
data, high uptime requirements.

950
0:43:26,0 --> 0:43:28,08
And then one of repeats over and
over again.

951
0:43:28,08 --> 0:43:30,6
So there's a lot of these installations and they're big and they're

952
0:43:30,6 --> 0:43:34,32
high performance is people who are doing OSM map builds.

953
0:43:35,46 --> 0:43:38,16
Because the open means OpenStreetMap.

954
0:43:38,16 --> 0:43:39,52
OpenStreetMap is a map of the world.

955
0:43:39,52 --> 0:43:43,36
It's a map of the world based on a very strict topological model,

956
0:43:43,66 --> 0:43:46,84
edge node collections, which means that anything which covers

957
0:43:46,84 --> 0:43:50,74
an area has to be built from those edge node relationships, which

958
0:43:50,74 --> 0:43:53,26
implies loading the whole edge node system and then building

959
0:43:53,26 --> 0:43:57,44
up the polygons from the parts and then testing the polygons

960
0:43:57,44 --> 0:43:58,4
against each other.

961
0:43:58,66 --> 0:44:2,0
And so people who end up using PostGIS as the intermediate

962
0:44:2,02 --> 0:44:4,84
stage between I've got a lot of raw open stream app data and

963
0:44:4,84 --> 0:44:8,68
I have a rendered tile or a vector tile, there's a big PostGIS

964
0:44:8,68 --> 0:44:11,88
Postgres process in the middle of that where it's shuffling through

965
0:44:11,88 --> 0:44:14,24
all the data and building up the higher level structures just

966
0:44:14,24 --> 0:44:16,5
don't exist in the raw OSM dump.

967
0:44:16,98 --> 0:44:18,84
And there's, there's gotta be hundreds and hundreds of those,

968
0:44:18,84 --> 0:44:20,28
but they all look about the same size.

969
0:44:20,28 --> 0:44:21,42
There's a size of OSM.

970
0:44:21,82 --> 0:44:23,0
Regina: There's more than hundreds.

971
0:44:23,36 --> 0:44:23,8
Paul: Okay.

972
0:44:23,8 --> 0:44:24,84
Thousands of them.

973
0:44:26,06 --> 0:44:26,32
Nikolay: Yeah.

974
0:44:26,32 --> 0:44:30,06
Also, I also saw in your blog posts, Redfin, State Farm, like

975
0:44:30,06 --> 0:44:34,0
we should have a lot, this setup should have a lot of data, right?

976
0:44:34,0 --> 0:44:36,82
Paul: They do, and real production loads.

977
0:44:36,82 --> 0:44:39,14
They're actually, they're asking real questions to the spatial

978
0:44:39,14 --> 0:44:41,42
database and getting spatial answers back.

979
0:44:42,5 --> 0:44:45,72
Nikolay: And also state level, government level, like in France,

980
0:44:45,92 --> 0:44:46,42
Norway?

981
0:44:47,22 --> 0:44:49,28
Paul: Yeah, that was our biggest installation for a long time.

982
0:44:49,28 --> 0:44:51,7
It was the biggest one I could like quote chapter and verse on.

983
0:44:51,7 --> 0:44:54,22
It was the, and they came on very early 2005.

984
0:44:54,36 --> 0:44:59,06
1 of our first big institutional users was IGN, the French National

985
0:44:59,06 --> 0:44:59,84
Mapping Agency.

986
0:45:0,48 --> 0:45:5,04
And they ran Postgres and PostGIS through a competitive process

987
0:45:5,2 --> 0:45:10,58
with Db2 Spatial and Oracle Spatial at that time, 2005, and determined

988
0:45:10,84 --> 0:45:14,56
that Postgres and PostGIS was, in other words, just as good as

989
0:45:14,7 --> 0:45:18,76
Db2 Spatial and Oracle Spatial, and most importantly, like way

990
0:45:18,76 --> 0:45:19,26
cheaper.

991
0:45:19,54 --> 0:45:21,02
So, and way easier to deploy.

992
0:45:21,02 --> 0:45:23,22
They didn't have to go through like a full commercial, like commercial

993
0:45:23,22 --> 0:45:23,6
requisition.

994
0:45:23,6 --> 0:45:25,12
They could just say, okay, we're going to use it.

995
0:45:25,12 --> 0:45:25,76
And they did.

996
0:45:25,76 --> 0:45:29,28
So IGN has been on Postgres PostGIS for 20 years.

997
0:45:29,44 --> 0:45:33,3
And that initial database was a 150 million object database.

998
0:45:33,54 --> 0:45:33,88
Michael: At the

999
0:45:33,88 --> 0:45:35,32
Paul: time, very substantial.

1000
0:45:35,5 --> 0:45:37,0
Nowadays, it's routing error.

1001
0:45:37,06 --> 0:45:39,24
But at the time, it was one of the biggest databases.

1002
0:45:39,24 --> 0:45:42,24
Nikolay: You needed to be really brave those days because of

1003
0:45:42,24 --> 0:45:44,44
lack of replication and proper backup.

1004
0:45:44,8 --> 0:45:45,3
Yeah.

1005
0:45:45,92 --> 0:45:46,96
Today, it's much, much better.

1006
0:45:46,96 --> 0:45:47,72
Much better.

1007
0:45:48,62 --> 0:45:51,52
Paul: They wanted to have, one of the requirements was multi-site

1008
0:45:51,66 --> 0:45:52,16
replication.

1009
0:45:52,8 --> 0:45:54,48
Because they wanted to have...

1010
0:45:55,46 --> 0:45:57,18
I think it was Slony to start with.

1011
0:45:57,26 --> 0:45:57,76
Nikolay: Yeah.

1012
0:45:58,84 --> 0:45:59,34
Good.

1013
0:46:0,12 --> 0:46:2,56
I'm glad now it's already solved.

1014
0:46:2,84 --> 0:46:3,34
Relatively.

1015
0:46:3,42 --> 0:46:6,26
We have problems with large setups, but it's already a different

1016
0:46:6,26 --> 0:46:6,76
story.

1017
0:46:7,22 --> 0:46:7,72
Great.

1018
0:46:8,76 --> 0:46:10,48
Michael: Is there anything we haven't talked about that either

1019
0:46:10,48 --> 0:46:12,68
of you wanted to make sure we mentioned?

1020
0:46:13,44 --> 0:46:14,98
Regina: I guess I should mention conferences.

1021
0:46:15,18 --> 0:46:20,92
So there's going to be a PgDay Boston conference, June 9th.

1022
0:46:20,92 --> 0:46:25,28
And then for the conference that Paul was mentioning, the free

1023
0:46:25,28 --> 0:46:29,14
and open source geospatial, that's going to be in Japan this

1024
0:46:29,14 --> 0:46:29,64
year.

1025
0:46:30,64 --> 0:46:31,8
And let's see.

1026
0:46:32,68 --> 0:46:35,58
Paul: That's Hiroshima, August 30th to September 5th.

1027
0:46:35,92 --> 0:46:36,42
Regina: Yeah.

1028
0:46:36,9 --> 0:46:38,98
Oh, and then there's also FOSS4G NA.

1029
0:46:39,4 --> 0:46:40,86
So that's coming up.

1030
0:46:41,18 --> 0:46:44,02
But I don't think we've said we haven't set this to the day yet,

1031
0:46:44,02 --> 0:46:46,22
but it's gonna be probably in November.

1032
0:46:48,08 --> 0:46:48,58
Paul: November.

1033
0:46:48,9 --> 0:46:51,38
Michael: I really appreciate that you wrote in PostGIS day,

1034
0:46:51,38 --> 0:46:52,68
the day after GIS day.

1035
0:46:52,68 --> 0:46:54,52
And I've actually watched a few of them.

1036
0:46:54,52 --> 0:46:56,42
We run them all online, which is great.

1037
0:46:56,72 --> 0:46:59,86
And I, yes, I've seen quite a few, tend to at least watch the

1038
0:46:59,86 --> 0:47:0,9
talks on performance.

1039
0:47:1,26 --> 0:47:5,14
My special interest and anything by Brian Timoney as well.

1040
0:47:5,14 --> 0:47:6,5
I wanted to give him a shout out.

1041
0:47:6,5 --> 0:47:7,54
I've enjoyed.

1042
0:47:9,0 --> 0:47:9,24
Paul: Yeah.

1043
0:47:9,24 --> 0:47:9,96
PostGIS day.

1044
0:47:9,96 --> 0:47:12,08
If you just type it in, you'll get all the video archives.

1045
0:47:12,08 --> 0:47:15,88
So if you're jonesing for PostGIS conference content, that's

1046
0:47:15,88 --> 0:47:17,46
the place to get it for sure.

1047
0:47:18,96 --> 0:47:19,3
Michael: Yeah.

1048
0:47:19,3 --> 0:47:19,94
It's awesome.

1049
0:47:19,94 --> 0:47:21,42
I highly recommend anything.

1050
0:47:21,42 --> 0:47:22,86
Any last things for me, Paul?

1051
0:47:22,94 --> 0:47:25,74
Paul: We haven't talked about AI,
which feels like a huge mess.

1052
0:47:26,44 --> 0:47:26,94
Yeah.

1053
0:47:27,18 --> 0:47:30,14
And, and I don't know what your
experience is with is with this

1054
0:47:30,14 --> 0:47:30,4
is

1055
0:47:30,4 --> 0:47:32,66
Regina: The next company has AI
in its name.

1056
0:47:32,8 --> 0:47:33,42
Paul: Oh yeah.

1057
0:47:33,42 --> 0:47:34,54
So he's really into it.

1058
0:47:34,54 --> 0:47:35,3
That's good.

1059
0:47:35,38 --> 0:47:37,8
And I wanted to point out like
a couple of things that I've seen

1060
0:47:37,8 --> 0:47:38,98
like in the AI space.

1061
0:47:39,0 --> 0:47:41,14
1 is the geospatial AI space.

1062
0:47:41,14 --> 0:47:44,96
There's this whole other field
called foundation models, which

1063
0:47:44,96 --> 0:47:48,58
is basically like RAG, retrieval
log, retrieval log meta generation,

1064
0:47:48,9 --> 0:47:52,32
like RAG for geospatial, where
they build up a characterization

1065
0:47:52,6 --> 0:47:56,64
of an area or the whole globe where
every pixel in that characterization

1066
0:47:57,1 --> 0:48:0,24
has a multi-dimensional and very
high dimensionality vector associated

1067
0:48:0,24 --> 0:48:0,6
with it.

1068
0:48:0,6 --> 0:48:2,14
So it really is a reg process.

1069
0:48:2,36 --> 0:48:6,06
But people have been using pgvector
as like the storage engine

1070
0:48:6,06 --> 0:48:9,92
for their foundation models and
to do like quick gatherings of

1071
0:48:9,92 --> 0:48:10,38
nearest neighbor.

1072
0:48:10,38 --> 0:48:13,62
And when you do nearest neighbor
with a foundation model, what

1073
0:48:13,62 --> 0:48:17,38
you get is like land, which is
similar to this other land.

1074
0:48:17,44 --> 0:48:20,5
So you can draw a circle around,
say, a bunch of solar panels

1075
0:48:20,5 --> 0:48:22,9
and say, show me the other solar
panels, and it will just go

1076
0:48:22,9 --> 0:48:25,52
off and find every solar panel
in the world for you.

1077
0:48:25,52 --> 0:48:26,28
Very cool stuff.

1078
0:48:26,28 --> 0:48:30,1
There's a, and there's a cool talk
on it in the 2025 Postgres

1079
0:48:30,14 --> 0:48:30,64
day.

1080
0:48:30,72 --> 0:48:31,86
Really worth looking at.

1081
0:48:31,86 --> 0:48:35,28
The other thing that came up in
that Postgres day around AI

1082
0:48:35,28 --> 0:48:40,52
was the idea of natural language
to SQL and swing a cat, throw

1083
0:48:40,52 --> 0:48:43,82
a stone, hit someone who's doing
natural language to SQL out

1084
0:48:43,82 --> 0:48:44,32
there.

1085
0:48:44,34 --> 0:48:48,62
But a really cool observation from
Brendan Ashworth from Bunting

1086
0:48:48,62 --> 0:48:53,2
Labs during his talk that a lot
of the code models, like how

1087
0:48:53,2 --> 0:48:56,2
you get a coding model is first
train a generic model and then

1088
0:48:56,2 --> 0:48:57,62
do reinforcement learning against
it.

1089
0:48:57,62 --> 0:49:0,3
And the way you get a code model
which can speak SQL, is you

1090
0:49:0,3 --> 0:49:3,9
do a lot of reinforcement learning
against standard SQL patterns.

1091
0:49:4,2 --> 0:49:6,66
And his observation was that the
standard SQL patterns tend to

1092
0:49:6,66 --> 0:49:8,6
be like Postgres SQL patterns.

1093
0:49:8,6 --> 0:49:11,28
So you tend to get really good
Postgres SQL and maybe not so

1094
0:49:11,28 --> 0:49:12,4
good Oracle SQL.

1095
0:49:12,4 --> 0:49:15,18
And the same thing holds true with
the spatial side because they're

1096
0:49:15,18 --> 0:49:19,46
using Postgres because PostGIS
is the lingua franca for geospatial

1097
0:49:19,64 --> 0:49:20,14
SQL.

1098
0:49:20,42 --> 0:49:24,24
You tend to get good spatial SQL
generated by these models.

1099
0:49:24,24 --> 0:49:29,76
And the real limiting step tends
to be providing the correct

1100
0:49:29,76 --> 0:49:33,6
amount of like semantic metadata
around your data model so that

1101
0:49:33,6 --> 0:49:38,16
the LLM can make sense of the natural
language question you're

1102
0:49:38,16 --> 0:49:40,2
making and spit out the right SQL.

1103
0:49:40,2 --> 0:49:43,18
But you're already ahead of the
game if you're using Postgres

1104
0:49:43,18 --> 0:49:45,76
and Postgres as your engine, because
these models will tend to

1105
0:49:45,76 --> 0:49:47,42
do SQL that matches them.

1106
0:49:47,64 --> 0:49:52,7
Nikolay: It's writing in Python
is easier with like using LLMs

1107
0:49:52,78 --> 0:49:55,6
because it's they know much better
than new languages.

1108
0:49:55,6 --> 0:49:56,98
Paul: Yeah, give me Perl 6.

1109
0:49:57,44 --> 0:50:0,28
Nikolay: Yeah, and that's why Postgres
is very natural because

1110
0:50:0,28 --> 0:50:4,4
so many, so much material it was
trained on it and Postgres the

1111
0:50:4,4 --> 0:50:4,92
same thing.

1112
0:50:4,92 --> 0:50:6,14
This is a great point.

1113
0:50:6,48 --> 0:50:10,74
So it should help to grow popularity
even more because a lot

1114
0:50:10,74 --> 0:50:13,22
of applications are created by
AI these days.

1115
0:50:13,38 --> 0:50:17,16
But I'm curious, if you deal with
a lot of data, you need to

1116
0:50:17,16 --> 0:50:17,66
verify.

1117
0:50:17,72 --> 0:50:23,34
If you don't have a way to verify
all the ideas AI is generating,

1118
0:50:24,52 --> 0:50:26,54
the quality will be not good.

1119
0:50:26,88 --> 0:50:31,62
So this is one thing I always do,
how to give AI tools to verify

1120
0:50:32,02 --> 0:50:34,94
its own ideas very fast without
humans.

1121
0:50:34,94 --> 0:50:38,16
Before humans are involved, it's
already much better.

1122
0:50:38,68 --> 0:50:40,88
I'm curious what you think about
this.

1123
0:50:41,6 --> 0:50:44,9702
How to write proper queries and
don't use this distance function

1124
0:50:44,9702 --> 0:50:45,19
you mentioned.

1125
0:50:45,19 --> 0:50:46,06
Function you mentioned.

1126
0:50:46,56 --> 0:50:49,4
Regina: The good thing is, as Paul
said, it does write the right

1127
0:50:49,4 --> 0:50:50,34
distance function.

1128
0:50:50,38 --> 0:50:55,88
It does do ST_DWithin, if you give
it a text, say, give me something

1129
0:50:55,88 --> 0:50:58,68
within 50 miles, it writes the
right query.

1130
0:50:58,68 --> 0:51:0,04
I have tested that.

1131
0:51:0,52 --> 0:51:3,06
Nikolay: Good, But it still would
be better if it would have

1132
0:51:3,06 --> 0:51:7,46
some tooling to test and see actual
results, not just guess.

1134
0:51:7,66 --> 0:51:11,68
Paul: Yeah, and we've started to
see that discipline arrive in

1135
0:51:11,68 --> 0:51:12,72
the PostGIS codebase.

1136
0:51:12,72 --> 0:51:14,54
We have our first agents file.

1138
0:51:15,06 --> 0:51:16,82
Regina: Oh yeah, we have that SKILL.md.

1139
0:51:17,32 --> 0:51:18,46
Paul: Or SKILL, sorry.

1140
0:51:18,6 --> 0:51:22,28
So yeah, we have our first skills
file, which does touch on that

1141
0:51:22,28 --> 0:51:23,8
kind of stuff you're talking about,
Nik.

1142
0:51:23,8 --> 0:51:27,24
Like these are best practices for
generating spatial SQL.

1143
0:51:27,7 --> 0:51:30,7
Nikolay: Yeah, in my opinion, copy
on write and database branching

1144
0:51:30,72 --> 0:51:35,38
should grow because we need to
be able to test on large data

1145
0:51:35,38 --> 0:51:38,94
sets for very low money and very
fast.

1146
0:51:39,06 --> 0:51:42,28
It can be as in S3, like some company
implemented, right?

1147
0:51:42,28 --> 0:51:44,12
Neon or Timescale, they implemented
it.

1148
0:51:44,12 --> 0:51:47,54
It can be with ZFS or anything,
but it should be fast and cheap.

1149
0:51:47,68 --> 0:51:51,82
So you give like millions of points
on the map and tell, just

1150
0:51:51,82 --> 0:51:54,18
check, EXPLAIN, ANALYZE
BUFFERS.

1151
0:51:55,38 --> 0:52:0,56
Michael: Or, Nik, did you see friend of the show, Radim posted

1152
0:52:0,94 --> 0:52:4,34
a new blog post recently about just import the stats.

1153
0:52:4,34 --> 0:52:6,22
So you don't even need to load all the data.

1154
0:52:6,22 --> 0:52:7,44
Nikolay: Yeah, I saw it.

1155
0:52:7,54 --> 0:52:10,76
Just test planning and behavior without data actually, just with

1156
0:52:10,76 --> 0:52:11,26
stats.

1157
0:52:11,46 --> 0:52:14,38
I think it's a valid approach, but at the same time I think it's

1158
0:52:14,38 --> 0:52:16,54
limited because you don't see actual execution.

1159
0:52:17,44 --> 0:52:19,92
Michael: Of course, but you can at least check like index use

1160
0:52:19,92 --> 0:52:21,1
and things, which is cool.

1161
0:52:21,1 --> 0:52:21,84
Nikolay: Yeah, definitely.

1162
0:52:22,2 --> 0:52:23,76
It's a super lightweight approach.

1163
0:52:23,86 --> 0:52:24,52
I like that.

1164
0:52:24,52 --> 0:52:27,28
And I think more and more should be developed because of AI.

1165
0:52:27,28 --> 0:52:28,86
Otherwise, so many mistakes.

1166
0:52:29,18 --> 0:52:33,28
But if you have this testing pipeline and you're even not involved.

1167
0:52:34,02 --> 0:52:39,0
AI has some ideas how to improve query and tooling to verify

1168
0:52:39,0 --> 0:52:42,24
this is the way and put it to test to CI and so on.

1169
0:52:42,24 --> 0:52:43,14
And it's great.

1170
0:52:44,06 --> 0:52:44,44
Yeah.

1171
0:52:44,44 --> 0:52:48,56
So that's why we created branching for many years ago with DBLab.

1172
0:52:48,56 --> 0:52:52,9
We like We were sitting and waiting for these very days when

1173
0:52:52,9 --> 0:52:54,18
it's needed so much.

1174
0:52:56,38 --> 0:52:56,88
Great.

1175
0:52:57,72 --> 0:52:59,76
Michael: Well, it was a pleasure to meet you both.

1176
0:52:59,76 --> 0:53:0,88
Thank you so much for coming.

1177
0:53:0,88 --> 0:53:2,26
Thank you for joining us.

1178
0:53:2,56 --> 0:53:3,58
Nikolay: Thank you for coming.

1179
0:53:3,84 --> 0:53:5,04
Regina: It was a nice talk.

1180
0:53:5,08 --> 0:53:5,78
Paul: Thanks Nik.

1181
0:53:6,0 --> 0:53:6,8
Thanks Michael.

1182
0:53:7,12 --> 0:53:7,62
Nikolay: Yeah.

1183
0:53:7,66 --> 0:53:8,72
Have a great week.