1
00:00:00,018 --> 00:00:03,708
Michael: Hello and welcome to Postgres f a
weekly show about all things Postgres qr.

2
00:00:04,008 --> 00:00:05,538
I'm Michael, founder of PG Mustard.

3
00:00:05,598 --> 00:00:08,028
This is my cohost Nicola, founder of Postgres ai.

4
00:00:08,388 --> 00:00:09,888
Hey, Nala, what are we talking about today?

5
00:00:10,618 --> 00:00:11,458
Nikolay: Hi Michael.

6
00:00:11,463 --> 00:00:17,068
Let's talk about motorized views, skipping regular
views, or you want to talk about them both.

7
00:00:17,607 --> 00:00:23,140
Michael: Well, I was most keen to talk about materialized views, but
it's difficult, to define them, I guess, without mentioning views at.

8
00:00:23,335 --> 00:00:23,665
Nikolay: Right.

9
00:00:24,025 --> 00:00:25,105
And the, and the rule system,

10
00:00:25,612 --> 00:00:26,002
Michael: Yeah,

11
00:00:26,032 --> 00:00:26,272
Nikolay: right?

12
00:00:26,743 --> 00:00:28,233
regular views also interesting.

13
00:00:28,233 --> 00:00:31,063
They have also interesting things to discuss, right?

14
00:00:31,250 --> 00:00:36,320
so yeah, let's, let's, uh, focus on meteorite
views, but briefly touch regular views.

15
00:00:36,320 --> 00:00:46,040
And as usual, let's start from very simple things, uh, and definitions
for those who are not really familiar with views, view system in pos.

16
00:00:46,460 --> 00:00:51,470
And, uh, then try to dive into deeper topics and future maybe a little.

17
00:00:52,058 --> 00:00:52,628
Michael: Sounds good.

18
00:00:53,402 --> 00:00:58,255
so I, I guess let's can go back, let's
think about tables relations in general.

19
00:00:58,390 --> 00:00:58,780
Nikolay: Yeah.

20
00:00:58,780 --> 00:00:58,930
Yeah.

21
00:00:59,065 --> 00:01:02,290
Michael: Um, We have ways of storing data in Postgres.

22
00:01:02,290 --> 00:01:10,450
We have tables, uh, which hopefully everybody's familiar
with, and then views, which are kind of a virtual table.

23
00:01:10,627 --> 00:01:20,677
so we can set up a query that, uh, queries tables or yeah, let's
stick to that simple version and call that a view, but that,

24
00:01:20,677 --> 00:01:24,447
that doesn't, store anything that's like, it's a view into the.

25
00:01:25,647 --> 00:01:30,327
But our queries, if we, if we're querying a lot of data
in the tables, we're still gonna be querying a lot of data

26
00:01:30,327 --> 00:01:34,767
via the views, whereas materialized views are, or they

27
00:01:35,367 --> 00:01:42,957
Nikolay: let's, let me little bit rephrase about, uh,
like, uh, we, the whole, whole this, uh, area is based

28
00:01:42,962 --> 00:01:46,347
on the fact that any query returns kind of a table.

29
00:01:46,797 --> 00:01:48,777
Uh, what, what is the table?

30
00:01:49,782 --> 00:02:00,102
It's a good question because, for example, in theory, all all roles in the
relation, all, all tops in relation should, should be distinguishable in pos.

31
00:02:00,342 --> 00:02:05,438
It's at, at like primary key is not
required so they can be not distinguishable.

32
00:02:05,905 --> 00:02:13,150
but if you also remember about, hidden Columns, CT
i d Exmark X mean, we discussed a couple of times.

33
00:02:13,230 --> 00:02:18,449
they are distinguishable again, but results, sets
of queries don't have those, invisible columns.

34
00:02:18,449 --> 00:02:20,609
So it's kind of a lot of nuances here.

35
00:02:20,639 --> 00:02:25,210
But, in like in simplified, point of view, any query returns to.

36
00:02:25,895 --> 00:02:33,105
Right, which means we could subs substitute some
table in some query within our query and vice versa.

37
00:02:33,405 --> 00:02:40,445
We could take some subquery and say, we can put a name
for it and store it as another query and just use, uh uh.

38
00:02:41,435 --> 00:02:42,245
Aas here.

39
00:02:42,275 --> 00:02:42,545
Right.

40
00:02:42,545 --> 00:02:51,065
And this is actually a view, and in pogo, uh, views are
implemented using, uh, rule system, which is very, very old

41
00:02:51,065 --> 00:02:56,095
thing, uh, originally POGO and maybe INGOs had, many decades ago.

42
00:02:56,575 --> 00:03:05,043
And this, uh, uh, is rule system allows, users to define,
some rules, saying what to do at planning stage, not

43
00:03:05,048 --> 00:03:08,973
an execution stage as triggers, but in planning stage.

44
00:03:08,978 --> 00:03:10,233
So instead of this, do that.

45
00:03:10,863 --> 00:03:14,379
So we can say, when you see, select from view one.

46
00:03:14,939 --> 00:03:16,576
Just do this.

47
00:03:16,578 --> 00:03:18,768
Sub-select from there and then do what you want.

48
00:03:19,068 --> 00:03:23,388
And this, this rule system allows you to define, uh, views.

49
00:03:23,718 --> 00:03:29,491
And also good thing, by the way, it was my,
bachelor of thesis very long ago, updateable views.

50
00:03:29,491 --> 00:03:35,638
So you can even short if, if views are, quite
simple and, Postgres can propagate change.

51
00:03:36,068 --> 00:03:40,529
so you could not only select from them, but you
can insert them, update, delete, and it'll work.

52
00:03:40,529 --> 00:03:48,592
And by the way, some intermediate software like past Grid and for
example, if you're a super base user, you are using Past Grid.

53
00:03:49,072 --> 00:03:52,822
it encourages you to define API using views.

54
00:03:53,134 --> 00:03:58,916
and it relies on the fact that you can, change data,
not only read data there, and it's quite interesting.

55
00:03:58,921 --> 00:03:59,766
This is views, right?

56
00:04:00,186 --> 00:04:07,826
Based on rule system, not approaching execution, type of query
execution, just doing everything on the planning stage, right?

57
00:04:08,413 --> 00:04:09,553
So quite, quite good.

58
00:04:09,553 --> 00:04:10,913
Quite interesting.

59
00:04:11,219 --> 00:04:13,243
what's the main problem of use?

60
00:04:13,243 --> 00:04:14,159
like dependency.

61
00:04:14,894 --> 00:04:25,334
If you need to change underlying table or underlying views, if you have
hierarchy, multiple layers, you usually need to recreate your views.

62
00:04:25,574 --> 00:04:32,584
So it's quite, can be painful, especially if, uh,
your system is under constant, high load, right?

63
00:04:32,584 --> 00:04:33,104
High load.

64
00:04:33,534 --> 00:04:34,144
Mm-hmm.

65
00:04:34,624 --> 00:04:39,394
Michael: And, and I think the, the benefits of them are
extremely different to the benefits of Materialized.

66
00:04:39,484 --> 00:04:43,024
So you want, um, Like where, when do you see views used most?

67
00:04:44,359 --> 00:04:50,959
Nikolay: Well, originally I remember from my Bachelor
of Leisure, so like 20 plus years ago, originally idea.

68
00:04:50,989 --> 00:04:54,349
One of the key ideas is that this is helpful to control.

69
00:04:55,399 --> 00:05:01,309
so you can, uh, restrict, number of columns you provide, you expose via views.

70
00:05:01,669 --> 00:05:07,009
You can also restrict number of rows so you,
you can do it in both dimensions of table.

71
00:05:07,676 --> 00:05:17,666
Com and, and then you can have a very different, uh, you can provide, for
example, writeable access and view, but no direct access to table at all.

72
00:05:17,876 --> 00:05:19,796
And this is actually what progress is doing.

73
00:05:19,801 --> 00:05:24,116
They, you're supposed to have a separate schema there.

74
00:05:24,116 --> 00:05:25,826
You define, define views.

75
00:05:25,831 --> 00:05:28,286
Nobody is working like users.

76
00:05:28,826 --> 00:05:36,323
If your system don't work directly with table, they
work with it via views, it's helpful to manage success.

77
00:05:36,956 --> 00:05:38,606
also views are helpful.

78
00:05:38,726 --> 00:05:42,386
Sometimes just short shortcuts like others for your queries.

79
00:05:42,896 --> 00:05:43,406
Why not?

80
00:05:43,406 --> 00:05:44,696
Michael: Yeah, usability, right?

81
00:05:45,056 --> 00:05:47,786
Better names, but that you can give objects different names.

82
00:05:48,566 --> 00:05:55,766
To, to make them, you know, if you, if you regret a, a name and don't
want to actually do a big migration, I've seen it used for that as well.

83
00:05:56,516 --> 00:05:58,496
But yeah, the main one seems to be security.

84
00:05:58,826 --> 00:06:00,566
Nikolay: They add complexity to the system.

85
00:06:00,566 --> 00:06:01,226
Definitely.

86
00:06:01,226 --> 00:06:10,790
And especially if, again, if you have, several layers of views, so you
will end up dealing with like, if your schema is changing, often you

87
00:06:10,790 --> 00:06:15,498
will, you'll notice further head of maintaining your views constantly.

88
00:06:15,648 --> 00:06:16,788
But what else?

89
00:06:16,788 --> 00:06:19,338
Like where are, are views helpful?

90
00:06:19,506 --> 00:06:23,346
Actually reporting, like you have reports,
so you just store it and, and that's it.

91
00:06:23,856 --> 00:06:24,666
But it does help with

92
00:06:24,666 --> 00:06:25,536
speed anyhow.

93
00:06:25,578 --> 00:06:26,088
Michael: exactly

94
00:06:26,149 --> 00:06:27,951
Nikolay: yeah, it doesn't story in statistics.

95
00:06:27,951 --> 00:06:28,941
It doesn't help you.

96
00:06:28,971 --> 00:06:31,011
You cannot create index on view, nothing like that.

97
00:06:31,071 --> 00:06:32,331
Cause it's very ural.

98
00:06:32,331 --> 00:06:37,161
It's like just, it's like just a, just a query stored under some name.

99
00:06:37,161 --> 00:06:37,491
That's it.

100
00:06:37,791 --> 00:06:38,361
This is you.

101
00:06:38,896 --> 00:06:44,296
Michael: Can it ever hurt performance in terms of push
down for like pre predicates and things like that?

102
00:06:44,665 --> 00:06:54,671
Nikolay: I'm not so sure, like in some cases, of course, uh, if you, have a
view, you have indirectly you, you have less flexible,  ways to optimize your

103
00:06:54,671 --> 00:07:01,808
query, but, Like, in most cases, post's, planner, it can choose,  what to do.

104
00:07:01,808 --> 00:07:05,557
Like there is this joint collapse and, uh, another collapse,

105
00:07:05,608 --> 00:07:06,538
Michael: From collapse.

106
00:07:06,613 --> 00:07:08,233
Nikolay: yeah, yeah, from collapse setting.

107
00:07:08,233 --> 00:07:16,579
So it can, it basically says that, POGS  compares
different paths easily, and views are just, they're

108
00:07:16,579 --> 00:07:20,773
already substituted at  time, so it's just they're already.

109
00:07:20,773 --> 00:07:23,663
this query, which, defines the view, it's already there.

110
00:07:23,813 --> 00:07:24,083
Right?

111
00:07:24,113 --> 00:07:32,573
So basically I don't see big problem, main problem only about
maintenance of schema changes for me in my, from my experience.

112
00:07:32,873 --> 00:07:39,167
I might be wrong, of course, as usual, like my
experience is it's, it's not, the only one, right?

113
00:07:39,247 --> 00:07:40,867
Michael: Yeah, get in touch if you've had other

114
00:07:40,967 --> 00:07:41,267
Nikolay: Right.

115
00:07:41,267 --> 00:07:43,937
So materialized fuels another idea.

116
00:07:44,717 --> 00:07:47,854
I would say it's opposite idea, instead of just storing.

117
00:07:48,276 --> 00:07:52,474
View definition, let's store the data and view definition as well, right?

118
00:07:53,479 --> 00:07:54,649
Michael: Yeah, importantly.

119
00:07:54,889 --> 00:07:59,089
But it's, so it's showing the data and
at the moment that's at a point in time.

120
00:07:59,149 --> 00:08:02,269
So when, when the materialized view was created.

121
00:08:02,659 --> 00:08:05,309
So it's, , that query one at that time.

122
00:08:05,549 --> 00:08:07,649
So that, and again, let's.

123
00:08:08,474 --> 00:08:12,374
Probably want to catch them before I go into the, uh, benefits too quickly.

124
00:08:12,374 --> 00:08:21,614
But the idea is you can run an expensive, like computation of data
or you can, you can do some work once and then read from it a lot

125
00:08:21,614 --> 00:08:25,454
of times instead of having to do that expensive work multiple times.

126
00:08:25,928 --> 00:08:26,234
Nikolay: Uh, right.

127
00:08:26,414 --> 00:08:30,404
Actually, you can define just you, but
don't, don't have data in it originally.

128
00:08:30,644 --> 00:08:33,134
There is additional comment to populate.

129
00:08:33,434 --> 00:08:34,094
Uh, uh,

130
00:08:34,309 --> 00:08:35,234
Michael: the benefit of that?

131
00:08:35,234 --> 00:08:36,164
I couldn't mark it out.

132
00:08:36,854 --> 00:08:38,044
Nikolay: well, I don't know.

133
00:08:38,178 --> 00:08:42,408
some cases we can invent some cases, uh, synthetically.

134
00:08:42,438 --> 00:08:47,058
For example, you restore from, from damp and you don't need it.

135
00:08:47,826 --> 00:08:51,906
But you, like, you already restored your like, uh, the original data.

136
00:08:51,906 --> 00:09:00,317
This because metal, you, it's the data which is derived You always
can refresh it and have a up to date version so you restore from dump

137
00:09:00,317 --> 00:09:09,047
and you don't want to have, it there yet somehow just to consider
dump as already restored fully because all original data already there

138
00:09:09,653 --> 00:09:09,923
Michael: Yeah.

139
00:09:10,057 --> 00:09:11,057
So you can speed up your.

140
00:09:11,653 --> 00:09:12,553
Nikolay: Right, and later in.

141
00:09:12,717 --> 00:09:24,324
In separate step, it can be done using store pt, dump, um, pair of utilities
and PGE store has dash   uppercase and down case l pair of options.

142
00:09:24,534 --> 00:09:28,014
So one option provides the least of objects, uh, the dump.

143
00:09:28,689 --> 00:09:38,164
Which if, if dump is in custom for directory format and another option allows
you to filter out, everything you don't want  or filter in what you want to

144
00:09:38,169 --> 00:09:43,564
restore so you can restore ize to you, but don't restore data and do it later.

145
00:09:44,224 --> 00:09:45,124
Separately.

146
00:09:45,364 --> 00:09:52,781
This is actually, I did it, but I did it, , because I
had issues with, , by the way, maybe it's still there.

147
00:09:53,051 --> 00:09:58,721
So if you have multiple material to use and also
multiple layers of them here, here are here,

148
00:09:58,781 --> 00:09:59,081
right?

149
00:09:59,591 --> 00:10:01,601
in some cases, , restore doesn't.

150
00:10:01,650 --> 00:10:04,335
because,  uh, it does know the order

151
00:10:04,950 --> 00:10:05,250
Michael: Yeah.

152
00:10:05,475 --> 00:10:11,165
Nikolay: and it says, uh, I cannot, populate data yet for this
mutualized view because, , I depend on another mutualized view.

153
00:10:11,345 --> 00:10:14,515
It's, it's, it's empty or maybe even not create it at all.

154
00:10:14,515 --> 00:10:19,328
I don't remember details there, but so I
remember I wrote a script Like infinite loop

155
00:10:19,585 --> 00:10:25,825
and if a refresh test tube fails, just
skipped and another attempt is done later.

156
00:10:25,885 --> 00:10:31,882
So it was was brute force, reive refresh
of all test use to fill all layers of it.

157
00:10:32,392 --> 00:10:33,502
It worked very well.

158
00:10:34,168 --> 00:10:36,116
So, to stores data.

159
00:10:36,626 --> 00:10:38,546
I agree with you that it, it's helpful.

160
00:10:39,119 --> 00:10:39,942
to avoid.

161
00:10:39,990 --> 00:10:48,171
for example, uh, aggregates one of examples because we all
know the counting is quite slow in post is a raw store.

162
00:10:48,231 --> 00:10:51,831
So it's not cone store, unlike some analytical database systems.

163
00:10:52,281 --> 00:10:55,968
And, of course if you have billion rows, count will be slow.

164
00:10:55,968 --> 00:10:59,298
Even if you have a index on the scan, it'll still be slow.

165
00:10:59,742 --> 00:11:01,092
so you can pre-calculate.

166
00:11:01,797 --> 00:11:08,538
For particular groups, for example, and have
quite small to, and, you can build indexes on it.

167
00:11:08,838 --> 00:11:09,948
Any custom indexes.

168
00:11:09,948 --> 00:11:10,308
Right.

169
00:11:10,983 --> 00:11:14,073
Michael: Yeah, I don't think, I think a lot
of people don't realize that it's really cool.

170
00:11:14,178 --> 00:11:14,478
Nikolay: Yeah.

171
00:11:14,478 --> 00:11:16,128
It's because it's a table Actually.

172
00:11:16,194 --> 00:11:21,714
I would like everyone to understand that instead of
to you, you have very good, powerful tool in pogs

173
00:11:22,044 --> 00:11:27,095
Create table as select it's, consider it as a one time.

174
00:11:28,150 --> 00:11:30,580
without refresh capability, it's the same.

175
00:11:31,930 --> 00:11:35,650
Michael: Yes, but with refresh capability,
I think it becomes even more powerful.

176
00:11:35,655 --> 00:11:35,890
Right?

177
00:11:35,980 --> 00:11:43,030
Nikolay: I, I would argue let's, like, I, I, I mean, I
agree in many cases, but in some cases I don't agree.

178
00:11:43,030 --> 00:11:45,250
And I will tell you why in the moment.

179
00:11:46,670 --> 00:11:55,485
So, create table as select gives you powerful ability to, create some derived
table based on several, one or several other tables or choice to use actually.

180
00:11:55,995 --> 00:11:58,350
And, then create indexes and work with it.

181
00:11:58,650 --> 00:12:00,570
Very good for experimenting.

182
00:12:00,600 --> 00:12:04,037
It's also very good and, , internally it's good.

183
00:12:04,457 --> 00:12:06,420
It doesn't spa, the buffer.

184
00:12:07,590 --> 00:12:09,270
because the in is used there.

185
00:12:09,270 --> 00:12:13,200
So only small, small amount of, buffers relatively small.

186
00:12:13,200 --> 00:12:13,920
I don't remember.

187
00:12:13,920 --> 00:12:15,660
Always forget the size of it.

188
00:12:16,020 --> 00:12:22,543
But, , of course the system, page cash, it doesn't,
have knowledge what data you are getting from disc.

189
00:12:22,573 --> 00:12:24,763
So it'll be spent with this action.

190
00:12:25,033 --> 00:12:27,073
But the buffer pole will remain good.

191
00:12:27,073 --> 00:12:30,680
I mean, it won't evict useful buffers from your system, which is.

192
00:12:31,310 --> 00:12:35,480
Create, select and, uh, you can create indexes.

193
00:12:35,570 --> 00:12:41,725
And I would like to point to very important
thing I realized not very long ago, very simple.

194
00:12:41,725 --> 00:12:43,855
I, I, I like to say silly things.

195
00:12:43,855 --> 00:12:51,522
By the way, I want to apologize for the last episode because
I said, in ses cannot be had heon lead apples and deletes.

196
00:12:51,522 --> 00:12:54,372
Cannot be heard well in ses, cannot be had.

197
00:12:54,552 --> 00:12:56,562
It's like, doesn't make sense, but deletes.

198
00:12:57,012 --> 00:13:02,242
They even don't change, indexes at all
because indexes don't have visibility in four.

199
00:13:02,452 --> 00:13:07,582
So, I, I, it was a wrong idea to even, to raise, delete.

200
00:13:07,635 --> 00:13:11,985
as we also discussed, it just puts, uh, value of transaction ID to x.

201
00:13:12,915 --> 00:13:18,605
In the, he in in table, pages and doesn't do anything with indexes at all.

202
00:13:18,605 --> 00:13:22,295
And, and so it cannot be hard because it's not needed to be hot at all.

203
00:13:22,655 --> 00:13:30,275
It's already hot, actually, like new statement deletes
are always hot already because they don't change indexes.

204
00:13:30,695 --> 00:13:37,205
And of course indexes, uh, uh, I changed later by, by
vacuum activities, but it's synchronously, which is good.

205
00:13:38,165 --> 00:13:39,485
So back to our topic.

206
00:13:39,945 --> 00:13:48,795
The idea that you can create table as select or create marginalized
to which currently we can consider like kind of equivalence.

207
00:13:49,155 --> 00:13:51,555
So we will see differences a little bit later.

208
00:13:51,945 --> 00:13:59,455
You can create indexes, and this is super cool because if joint
is involved, you can create an index on two tables at once.

209
00:13:59,545 --> 00:14:04,312
Index it always belongs to some table, but
sometimes, for example, you want to order.

210
00:14:05,182 --> 00:14:08,332
By two values, which are present in two different tables.

211
00:14:08,362 --> 00:14:16,542
Or you want to order by and filter, and you could do
it using single index scan, both order by limit and.

212
00:14:17,070 --> 00:14:23,587
Order by and, uh, where clothes uses one
column, order by and limit another column.

213
00:14:23,947 --> 00:14:27,577
We could combine it in an index, like two column index, for example.

214
00:14:27,817 --> 00:14:30,907
But we cannot, because these columns are in different tables, right?

215
00:14:30,907 --> 00:14:33,067
So we cannot create an index for two tables.

216
00:14:33,547 --> 00:14:39,427
But when you join and creates a create, select, and join, Create to you.

217
00:14:39,757 --> 00:14:46,437
You finally can do, can have this index, even if you
don't do anything with data, except just joining it

218
00:14:47,157 --> 00:14:47,342
Michael: Yeah.

219
00:14:47,612 --> 00:14:48,602
I mean that, yeah.

220
00:14:48,602 --> 00:14:49,172
That's awesome.

221
00:14:49,172 --> 00:14:51,092
I, um, I've not seen anyone doing that.

222
00:14:51,092 --> 00:14:53,012
That's super, such a cool idea.

223
00:14:53,282 --> 00:14:55,202
Nikolay: I think everyone doing it just implicitly,

224
00:14:55,832 --> 00:14:57,182
Michael: oh, interesting.

225
00:14:57,242 --> 00:15:05,612
So the, so in fact, this actually takes us quite quickly to one of the
down, like, that sounds, that's great, but you are, you, you do, you

226
00:15:05,612 --> 00:15:13,292
are, it's, it's a minor downside, I guess, for most people, but that,
that is taking up space on disk, in cash, in cashier and things like

227
00:15:13,337 --> 00:15:17,147
Nikolay: Which has to take space and create select will will take space.

228
00:15:17,327 --> 00:15:17,417
Right.

229
00:15:17,567 --> 00:15:17,897
Michael: Yeah.

230
00:15:18,437 --> 00:15:23,087
So it's, it is a minor downside for that
benefit, but it's, it's the reason why

231
00:15:23,477 --> 00:15:31,427
Nikolay: But the speed benefit can be many, many orders of market
because when you avoid join, and you can have single index scan

232
00:15:31,432 --> 00:15:40,170
and maybe even index only scan compared to 2 index scans in two
different tables, and then one of three algorithms to, to join it.

233
00:15:40,170 --> 00:15:42,923
Like it's very, it's like game changer.

234
00:15:43,023 --> 00:15:44,483
Could be in terms of performance,

235
00:15:45,083 --> 00:15:45,443
Michael: Yep.

236
00:15:45,623 --> 00:15:46,463
I completely agree.

237
00:15:46,793 --> 00:15:52,333
The, the other like, let's, I think while I'm talking
about downsides, the main one that people normally,

238
00:15:52,373 --> 00:15:53,123
Nikolay: justify size.

239
00:15:53,303 --> 00:16:04,366
Like you can, instead of, , putting query, which will solve, your, like, like
you have some goal to do something with data and you have a query, you want

240
00:16:04,366 --> 00:16:14,671
to improve it, the speed of it and Okay, I will basically, what, what is this
create table effect or create to, it's kind of, instead of considering it

241
00:16:14,671 --> 00:16:27,101
as a query cash, you could, uh, for example, store more, have more indexes
on this, uh, derived table or mutualized you and, uh, have more, for future.

242
00:16:27,211 --> 00:16:31,331
You, you, you can, , support more queries with this cash.

243
00:16:31,421 --> 00:16:33,521
Like cash, not the final result.

244
00:16:33,521 --> 00:16:35,511
Cash,  data, right?

245
00:16:35,931 --> 00:16:42,099
So, as we know, query cash is kind of, it's good until it's not right.

246
00:16:42,789 --> 00:16:46,119
For example, my SQL removed query cash from engine at some point.

247
00:16:46,921 --> 00:16:52,399
Michael: it feels like one of those things where
there are use cases that it can hugely benefit.

248
00:16:52,399 --> 00:16:57,289
Like if your data's not changing much or if you have past data that, um, it's

249
00:16:57,364 --> 00:17:02,224
Nikolay: You, you, you're pushing us too, too
fast to the main problem with this approach.

250
00:17:02,224 --> 00:17:02,854
Uh uh.

251
00:17:03,339 --> 00:17:07,899
Which, which also one of two biggest problems
in computer science as we know, right?

252
00:17:07,929 --> 00:17:11,469
Cash and validation, how to, how to maintain the state of it.

253
00:17:11,499 --> 00:17:17,169
Not to, of course when we say create select or we say  data.

254
00:17:17,169 --> 00:17:19,724
There, of course,  the data is, is frozen, right?

255
00:17:19,724 --> 00:17:19,784
It.

256
00:17:20,709 --> 00:17:21,199
Michael: Yeah.

257
00:17:21,199 --> 00:17:29,159
But I mean, like if we're talking about a, like some people do have
analytics databases from static data that isn't changing at all.

258
00:17:29,829 --> 00:17:30,789
And that for

259
00:17:31,074 --> 00:17:34,614
Nikolay: I doubt, uh, well, maybe some cases, right?

260
00:17:34,614 --> 00:17:43,557
But, uh, I, I mostly observe cases when people want to
maintain the state of analytical database with very small lag.

261
00:17:44,142 --> 00:17:44,472
Michael: Yeah.

262
00:17:44,927 --> 00:17:49,667
Nikolay: or even, even they, they want to have
it in the same system, so in progress and so on.

263
00:17:49,667 --> 00:17:50,747
So like each step.

264
00:17:51,257 --> 00:17:52,367
Uh, approach.

265
00:17:53,447 --> 00:17:54,137
Michael: Hybrid, right?

266
00:17:54,137 --> 00:17:56,177
Hybrid, uh, transactional

267
00:17:56,417 --> 00:18:01,667
Nikolay: the main problem is how to update it and, uh, what pogs offers.

268
00:18:01,697 --> 00:18:10,067
If you forget about my idea of Creative Select, which I still think is
quite good, and, uh, it can beat  in some cases, I will explain why.

269
00:18:10,072 --> 00:18:18,107
Like, I, I, I keep this like intrigue, uh, uh,
state, so we have can say, uh, refreshment.

270
00:18:19,217 --> 00:18:19,577
Right.

271
00:18:20,597 --> 00:18:21,437
Refreshment size.

272
00:18:22,217 --> 00:18:22,637
Michael: Yeah.

273
00:18:22,697 --> 00:18:28,307
And importantly, I didn't realize this was added so
early, but refresh materials you concurrently as well.

274
00:18:28,517 --> 00:18:28,727
Nikolay: Right.

275
00:18:28,877 --> 00:18:29,747
What's the difference?

276
00:18:30,347 --> 00:18:34,048
Difference is  one will lead to blow, another will, will not.

277
00:18:34,108 --> 00:18:34,378
Right.

278
00:18:36,328 --> 00:18:36,508
. Right.

279
00:18:36,628 --> 00:18:39,628
Michael: Um, so the big, the big difference, and I, so I

280
00:18:39,733 --> 00:18:42,553
Nikolay: Like, like developer's point of view, it's so good.

281
00:18:42,553 --> 00:18:48,103
Our, our queries are work's point of view or blo  who, who knew it?

282
00:18:48,313 --> 00:18:49,273
It can happen as well.

283
00:18:50,878 --> 00:18:57,688
, 
Michael: So if we refresh a materialized view without
concurrently, we block selects on the materialized view while it's.

284
00:18:58,228 --> 00:18:58,888
Refreshing.

285
00:18:58,948 --> 00:18:59,338
Right.

286
00:18:59,398 --> 00:19:06,238
And that's offered like it, bear in mind we're normally doing
this on a slightly slow query because that's the, the benefit of

287
00:19:06,463 --> 00:19:08,153
Nikolay: It's like vacuum fool, basically.

288
00:19:08,923 --> 00:19:09,193
Right,

289
00:19:09,913 --> 00:19:13,443
but with data will be fresh, our goal is achieved.

290
00:19:14,443 --> 00:19:20,413
Michael: Yes, but with concurrently it's like a, I
guess the equivalent would be a PG pack or something.

291
00:19:20,473 --> 00:19:26,563
Uh, where, or, or create index concurrently, or
sorry, reindex concurrently would be the equivalent.

292
00:19:27,103 --> 00:19:35,072
So, with concurrently, we don't block selects on the past state of
the material that's viewed and it replaces it once it's finished.

293
00:19:36,512 --> 00:19:36,842
Nikolay: Right.

294
00:19:37,232 --> 00:19:37,352
Michael: is

295
00:19:37,352 --> 00:19:38,252
that, is that right?

296
00:19:38,672 --> 00:19:39,362
Nikolay: yeah, that's right.

297
00:19:39,392 --> 00:19:46,832
But, uh, the main point to remember here is that we can refresh
it only fully, we cannot refresh part of it, unfortunately,

298
00:19:47,522 --> 00:19:55,152
if, if we use regularized fuel, like, original postals
provides because, uh, just refreshment has to concurrently.

299
00:19:55,915 --> 00:19:57,476
there are no like options.

300
00:19:57,476 --> 00:20:01,916
Say that I know 90% of my data hasn't changed because it's like very old data.

301
00:20:01,916 --> 00:20:04,256
So I, I probably could do it much faster.

302
00:20:04,633 --> 00:20:14,299
so you refresh everything and it can take a lot of time but
concurrent select  working, it's good, but, we, accumulate blood of.

303
00:20:15,664 --> 00:20:16,114
Indexes.

304
00:20:16,119 --> 00:20:24,251
If, if we follow my, approach, let's not, use, Let's,
let's define  with some underlying data, maybe raw

305
00:20:24,256 --> 00:20:27,851
data and support more queries, more kinds of queries.

306
00:20:27,851 --> 00:20:30,851
With it, it means that probably you have multiple indexes there.

307
00:20:30,911 --> 00:20:32,261
Oh, we forgot to say that.

308
00:20:32,261 --> 00:20:38,081
For, to support concurrent update refresh, we need, uh, unique index.

309
00:20:38,666 --> 00:20:40,646
Michael: Exactly on one of the columns,

310
00:20:40,826 --> 00:20:41,546
Nikolay: it's mandatory.

311
00:20:42,686 --> 00:20:50,436
So already one index is there, but if you follow my
approach, let's have raw data and uh, support more queries.

312
00:20:50,436 --> 00:20:55,296
You probably have multiple indexes there
and there health will degrade as well.

313
00:20:55,806 --> 00:20:57,516
So all problems we have with.

314
00:20:57,812 --> 00:21:02,007
tables which are, receive significant numbers of updates, deletes.

315
00:21:02,304 --> 00:21:12,324
we'll have this here as well, so repacking or something, and here idea like
why, just not to recreate it if we still need to populate fully it fully.

316
00:21:12,329 --> 00:21:12,614
Right.

317
00:21:12,994 --> 00:21:13,414
Why?

318
00:21:13,874 --> 00:21:15,414
And maybe it can.

319
00:21:15,954 --> 00:21:25,374
Just recreate, ized, view we under a different name, and then quickly
swap it in, in single transaction or just this create table, select.

320
00:21:25,374 --> 00:21:29,364
It's like if it, if it's recreated every time we want to refresh.

321
00:21:29,664 --> 00:21:32,455
What's the point of having ized view?

322
00:21:32,484 --> 00:21:36,744
Just maybe convenience, because definition of it, uh, remembers the.

323
00:21:37,877 --> 00:21:38,207
right?

324
00:21:38,297 --> 00:21:41,657
So it, like a database system stores the definition.

325
00:21:41,747 --> 00:21:43,187
It like, it's better.

326
00:21:43,192 --> 00:21:44,177
Maybe that's it.

327
00:21:44,757 --> 00:21:50,187
In some, in some cases I would like, I would
seriously consider a recreation of it every time.

328
00:21:50,967 --> 00:21:55,495
, just to not to deal with blo in data part of it and in indexes.

329
00:21:55,495 --> 00:21:56,275
So we have.

330
00:21:57,035 --> 00:21:57,335
Right.

331
00:21:57,815 --> 00:22:05,435
But of course if we do it every minute, for example, I would check the
blo of system catalogs as well because if you have a lot of details

332
00:22:05,585 --> 00:22:09,155
you consider the consider DL is very, very often thing to happen.

333
00:22:09,215 --> 00:22:13,914
Uh, you might end up having, very bloated PG class and so on.

334
00:22:14,604 --> 00:22:16,514
So it's also worth remembering.

335
00:22:17,094 --> 00:22:20,694
And, uh, park cannot repak, uh, system catalog.

336
00:22:20,694 --> 00:22:22,494
So you need to do vacuum full.

337
00:22:22,944 --> 00:22:30,384
Fortunately, it's usually very fast, but, uh, you don't
want to, to, to be like, uh, to have 99% of blood or

338
00:22:30,384 --> 00:22:32,983
something because it'll affect basically all queries.

339
00:22:33,392 --> 00:22:36,812
cause planning will be slowed down and so many things.

340
00:22:37,502 --> 00:22:43,738
So, , bottom line refreshment size two
is good, but it's very, very rough tool.

341
00:22:44,668 --> 00:22:55,283
One of, uh, our customers, enjoyed using them until some point then said,
it's like having huge hammer and, uh, just applying it to very small things.

342
00:22:55,313 --> 00:22:56,393
Like it's very rough too.

343
00:22:56,723 --> 00:22:57,413
Very rough.

344
00:22:57,563 --> 00:22:59,843
So it saves you sometimes.

345
00:22:59,843 --> 00:23:07,684
But then, like if, if you only small portions of your data is
changing, maybe you will be not satisfied with this approach.

346
00:23:07,684 --> 00:23:09,244
Refresh it fully always.

347
00:23:09,784 --> 00:23:14,795
And, uh, here also, like by the way, could ized u be update.

348
00:23:15,907 --> 00:23:17,048
Michael: Do you mean increment?

349
00:23:17,053 --> 00:23:17,858
Like what do you mean?

350
00:23:18,533 --> 00:23:21,593
Nikolay: updateable means like in the same, we discussed it for regular views.

351
00:23:21,653 --> 00:23:22,943
Uh, you can insert it.

352
00:23:23,829 --> 00:23:23,919
Michael: I

353
00:23:23,919 --> 00:23:24,159
don't

354
00:23:24,159 --> 00:23:25,779
see how that would make sense.

355
00:23:26,334 --> 00:23:33,625
Nikolay: Well, well, it might, it might make sense at
some point, but, uh, maybe it's too like exotic and I,

356
00:23:33,625 --> 00:23:36,085
I suspect other systems have it, but I don't remember.

357
00:23:36,295 --> 00:23:40,765
Let's, let's keep it just the question to,
to know where, just entertaining question.

358
00:23:41,125 --> 00:23:50,220
But, uh, as for the main topic, what would we need, we would
need to be able to, uh, update on the parts of the modularized.

359
00:23:51,167 --> 00:23:59,657
And, uh, there is a big theory, described in not big, but
some theory in described in, uh, Wiki page, post's wiki page.

360
00:23:59,657 --> 00:24:04,847
And the discussion is happening already many,
many years, maybe decades, at least one decade.

361
00:24:04,857 --> 00:24:11,380
I, I would say, to have, incrementally maintained
views incrementally refreshed or maintained view.

362
00:24:11,385 --> 00:24:14,229
So we want, uh, to avoid full refresh.

363
00:24:14,709 --> 00:24:15,009
Right.

364
00:24:15,069 --> 00:24:17,289
We want to adjust only part of it.

365
00:24:18,069 --> 00:24:26,252
And I, I already brought this, that you, it's like create table select,
but with rule system from views inherited, like we just remember

366
00:24:26,702 --> 00:24:36,032
this, you can use create table as select and then maintain the state
using triggers or some other things and, and have incremental data.

367
00:24:36,032 --> 00:24:39,752
If, if you understand the logic, probably it's the good way to.

368
00:24:40,322 --> 00:24:40,622
Right.

369
00:24:40,832 --> 00:24:43,592
Just to maintain from your application.

370
00:24:44,072 --> 00:24:55,322
The, the question to this approach will be, do you lose data  and,
uh, do you slow down your rights on the, uh, the baseline base tables?

371
00:24:56,282 --> 00:24:58,482
Michael: Exactly what are the, what are the trade off?

372
00:24:59,267 --> 00:25:05,417
. Um, I, it is actually quite an interesting time to talk about,
we could talk about the future and incrementally updating

373
00:25:05,727 --> 00:25:12,827
materials, views in Postgres, but I think it's also worth
talking about a couple of existing tools and systems that have.

374
00:25:13,337 --> 00:25:16,517
Partially tackled it or tackled it for their own specific case.

375
00:25:16,757 --> 00:25:22,457
The reason, um, well this was a requested topic,
so thank you to, uh, the person who requested it.

376
00:25:22,727 --> 00:25:29,267
But there was also an announcement, not in the Postgres space, but
by a company called Planet Scale that we've talked about briefly here

377
00:25:29,267 --> 00:25:40,187
before that announced a, a boost feature that that looks very similar
to a materialized views with, but with incremental updates and,

378
00:25:40,382 --> 00:25:47,132
Nikolay: and, and boost is it to achieve if you had a joint or
multiple joints with a lot of, , various kinds of index scans,

379
00:25:47,137 --> 00:25:50,342
and then suddenly you have index only, I'm, I'm talking OUS terms.

380
00:25:51,062 --> 00:25:54,302
Then suddenly you have single index scan, index only scan.

381
00:25:54,362 --> 00:25:59,172
Of course, it'll be like 10,000,  speed improvement or a hundred thousand even

382
00:25:59,832 --> 00:26:00,222
Michael: Yeah.

383
00:26:00,927 --> 00:26:06,057
They definitely went for a click baity title on it, but
it's, it's kind of what people sometimes want, right?

384
00:26:06,057 --> 00:26:09,897
They've got a slow query and they want a quick fix while they Yeah.

385
00:26:10,257 --> 00:26:15,687
Nikolay: I, I, I'm not concerned about the space,
additional, additional space usually, but sometimes I, I do.

386
00:26:15,777 --> 00:26:19,197
But, uh, the price of maintenance, how maintenance are organized.

387
00:26:19,257 --> 00:26:21,237
Do we have blood issues in pogs if we do this?

388
00:26:22,347 --> 00:26:22,707
So,

389
00:26:23,202 --> 00:26:28,482
Michael: The other, the other one I wanted to talk
about as an existing, there's a well, um, time scale,

390
00:26:28,482 --> 00:26:31,182
have their continuous aggregates feature as well.

391
00:26:31,182 --> 00:26:36,462
And, uh, we don't need to discuss 'em in depth here, but there's a good
blog post that they've done actually covering some of the topics we've

392
00:26:36,467 --> 00:26:40,332
just discussed about views, materialized views and continuous aggregates.

393
00:26:40,362 --> 00:26:48,642
And they have a solution that was, it's designed for time series workloads
where you're probably not gonna have loads of data, uh, in the past changing.

394
00:26:49,107 --> 00:26:52,287
But it can cope with that and it's been, they're being improved.

395
00:26:52,287 --> 00:26:57,788
So that's an interesting take if you, if that's
something you want and need, uh, and you're okay with.

396
00:26:57,793 --> 00:27:05,187
The trade offs now exists and it's kind of, it's
in this area, topic of incrementally updating

397
00:27:05,427 --> 00:27:08,342
Nikolay: you timescale user, you should go and check.

398
00:27:08,402 --> 00:27:10,712
Uh, continuous aggregates a hundred percent.

399
00:27:10,772 --> 00:27:12,932
This is one of key features I would say.

400
00:27:13,592 --> 00:27:13,602
I.

401
00:27:14,809 --> 00:27:21,499
but there is also project, uh, and there is, there is discussion
happening four year, almost four years already in hackers.

402
00:27:21,769 --> 00:27:28,909
Unfortunately, it's, uh, I, I don't know details, but I
just see it in implementing incremental view maintenance.

403
00:27:28,969 --> 00:27:37,429
Uh, started in December, 2018 from Nagata, and the same offer created, uh, pg.

404
00:27:39,004 --> 00:27:40,684
Pro project, which is

405
00:27:41,384 --> 00:27:42,304
it's extension.

406
00:27:42,544 --> 00:27:43,324
Extension, okay.

407
00:27:43,744 --> 00:27:44,734
Open source extension.

408
00:27:45,034 --> 00:27:45,754
It's quite good.

409
00:27:46,154 --> 00:27:46,594
Interesting work.

410
00:27:47,411 --> 00:27:50,081
and this is attempt to follow this approach.

411
00:27:50,086 --> 00:27:51,071
Like let's have

412
00:27:51,171 --> 00:27:53,971
, incremental updates,  automatic, partial updates of you.

413
00:27:54,361 --> 00:27:59,311
But as I understand, like I, I've spent some time understanding how pg ibm.

414
00:27:59,844 --> 00:28:06,134
it works first of all, the jurisdictions, of course, on
queries, uh, that can be used for dig, define such views.

415
00:28:06,704 --> 00:28:09,524
And also, as I understand, there are two approaches.

416
00:28:09,584 --> 00:28:16,334
Uh, first is synchronous propagation of data change, which
will slow down our original insert or update or delete.

417
00:28:16,339 --> 00:28:16,604
Right.

418
00:28:17,599 --> 00:28:18,284
still, it's good.

419
00:28:18,284 --> 00:28:20,684
It's, it's, it's good to think, to have it automated.

420
00:28:21,374 --> 00:28:24,134
Michael: yeah, I think of that as an immediately updating materialized view.

421
00:28:24,139 --> 00:28:26,204
So as soon as there's a change, it's,

422
00:28:26,279 --> 00:28:26,549
Nikolay: right?

423
00:28:26,969 --> 00:28:37,289
It means that, uh, at commit time, like our transactions, which
changed the original table, will slow down of course, but it, it's

424
00:28:37,289 --> 00:28:40,709
good that, uh, we don't need to write a trigger to, to, to do it.

425
00:28:41,369 --> 00:28:44,997
And second, uh, way is to have, refresh.

426
00:28:45,384 --> 00:28:46,514
I, I, I didn't get it.

427
00:28:46,544 --> 00:28:52,207
Maybe like it's still full refresh, like
before or No, like I didn't get this part

428
00:28:52,452 --> 00:28:53,452
Michael: I think they use trigger.

429
00:28:54,377 --> 00:28:54,737
Nikolay: right.

430
00:28:54,767 --> 00:28:55,067
Okay.

431
00:28:55,067 --> 00:29:06,157
So it's, uh, if, if I updated only one row out of billion will, um, second
option will propagate change very fast compared to refreshment last view.

432
00:29:06,797 --> 00:29:09,107
Michael: I didn't realize pg IBM had two options.

433
00:29:09,107 --> 00:29:13,847
I thought they might be talking more philosophically
about their being two options and they.

434
00:29:14,282 --> 00:29:19,652
That was my understanding, but I definitely might have,
there's a good video by, Lucas Fiddle on this as well.

435
00:29:19,652 --> 00:29:21,182
We mentioned his channel last time.

436
00:29:21,182 --> 00:29:22,382
I'll link up that video.

437
00:29:22,442 --> 00:29:22,712
Nikolay: yeah.

438
00:29:22,862 --> 00:29:27,482
Keep mentioning this channel and like us, our podcast.

439
00:29:27,487 --> 00:29:28,442
It's only five minutes.

440
00:29:28,447 --> 00:29:36,272
Always very much faster for, for those poor doc owners who work with docs.

441
00:29:36,272 --> 00:29:37,112
Only five minutes,

442
00:29:38,042 --> 00:29:38,282
Michael: Yeah.

443
00:29:38,282 --> 00:29:39,402
Maybe ones with little legs.

444
00:29:39,402 --> 00:29:39,882
Yeah.

445
00:29:40,052 --> 00:29:40,382
Nikolay: Right.

446
00:29:41,112 --> 00:29:41,672
. Right.

447
00:29:41,762 --> 00:29:49,618
So, why I think, well, well still I, this is great,
but two big questions to this direction as a whole.

448
00:29:49,738 --> 00:29:56,958
First, can we do it a synchronously, but, like propagated
always, but a synchronously, like first option, but

449
00:29:57,438 --> 00:29:59,808
synchronously, I don't want my original transaction.

450
00:30:00,423 --> 00:30:01,023
To slow down.

451
00:30:01,623 --> 00:30:06,273
And second, uh, this is usually needed when we have a lot of data.

452
00:30:06,363 --> 00:30:10,293
And, uh, when we have a lot of data, we always should use partitioning.

453
00:30:10,626 --> 00:30:14,423
I want my you also partition maybe, right?

454
00:30:14,483 --> 00:30:15,728
So, what about this?

455
00:30:16,130 --> 00:30:17,600
Like for example, we partition by.

456
00:30:18,247 --> 00:30:22,447
and we have very, like five years ago
data, January, five years ago for example.

457
00:30:22,713 --> 00:30:25,261
It still can change, but so infrequently.

458
00:30:25,733 --> 00:30:34,433
So I would refresh it very less often, like, right, I don't want my,
my original tables be partition, but ized to you as a single thing.

459
00:30:34,915 --> 00:30:45,137
It means that, uh, if I refresh it fully, it, it's very, Query and will
affect vacuuming and will affect whole database leading to blo in all tables.

460
00:30:45,677 --> 00:30:55,064
If I, uh, updated partially well, I'm fine actually maybe fine
to have, well, vacuum will also need to work here as well.

461
00:30:55,064 --> 00:30:55,304
Right.

462
00:30:55,309 --> 00:31:03,573
And if it's a single huge table, all problems we have with huge
tables, which are not partied, will hit this motorized tissue as.

463
00:31:04,245 --> 00:31:07,222
even if we have partial incremental of human

464
00:31:08,367 --> 00:31:08,697
Michael: yeah.

465
00:31:09,027 --> 00:31:13,827
So if I understand you, we're kind of talking about
two extremes and like whether there's a middle ground.

466
00:31:14,707 --> 00:31:19,387
There's the extreme of having to refresh everything
each time, which we currently have in progress.

467
00:31:19,597 --> 00:31:28,250
There's the other extreme of,  synchronous immediate, , tri,
uh, trigger based updates so that our, our materials view is

468
00:31:28,250 --> 00:31:32,450
updated on each transaction, but that comes with right overhead.

469
00:31:32,880 --> 00:31:35,010
is there a middle ground option where

470
00:31:35,205 --> 00:31:37,515
Nikolay: We, we, there is, there is perfect solution.

471
00:31:37,515 --> 00:31:38,265
It's not milligram.

472
00:31:38,265 --> 00:31:39,195
It's perfect solution.

473
00:31:39,195 --> 00:31:39,855
Perfect solution.

474
00:31:39,855 --> 00:31:45,855
Should, uh, propagate data as synchronously, not
slowing down original transaction and not losing.

475
00:31:46,655 --> 00:31:48,095
With some small leg.

476
00:31:48,725 --> 00:31:50,679
Ideally also, absorbable.

477
00:31:50,679 --> 00:31:53,349
So I would put this leg to some, I dunno, monitoring.

478
00:31:53,649 --> 00:31:56,021
And second, the results would be partied.

479
00:31:56,636 --> 00:31:58,256
Michael: This is where you lost me a little bit.

480
00:31:58,316 --> 00:32:04,256
So my understanding of materialized fees is normally
that they're hugely smaller than the original data.

481
00:32:04,316 --> 00:32:06,956
So I mean, how many, how big are these materialized

482
00:32:07,271 --> 00:32:08,231
Nikolay: I don't agree.

483
00:32:08,231 --> 00:32:10,511
I saw so big use.

484
00:32:10,931 --> 00:32:17,945
And even if, like, again, the idea of big, result, big comes from my idea.

485
00:32:17,945 --> 00:32:18,575
Let's not,

486
00:32:18,605 --> 00:32:24,875
Uh, create a, for each query, let's create it for like a, to avoid joints for.

487
00:32:25,865 --> 00:32:26,225
, 
right?

488
00:32:26,225 --> 00:32:28,625
And then run a lot of various queries on top of it.

489
00:32:29,251 --> 00:32:30,031
Quite good idea.

490
00:32:30,031 --> 00:32:33,211
It's like generalization, automated generalization, basically.

491
00:32:33,615 --> 00:32:37,815
but even if people just use it for a final query, just cash.

492
00:32:37,875 --> 00:32:43,125
Cash, the result of query, I still saw Huges huge.

493
00:32:44,320 --> 00:32:44,560
Michael: Interesting.

494
00:32:44,910 --> 00:32:45,510
Well, great.

495
00:32:46,110 --> 00:32:51,360
Nikolay: And if, if it have a huge, it should
be partition maybe to, to benefit from it.

496
00:32:51,720 --> 00:32:56,370
But the main key, the key, like I would highlight the key point here.

497
00:32:56,580 --> 00:33:06,110
PO unfortunately doesn't have this, what Oracle has, uh, prma autonomous,
uh, the ability to, initiate transaction, which will be detach.

498
00:33:06,980 --> 00:33:08,240
From a trigger, for example.

499
00:33:08,450 --> 00:33:13,100
So I want to, to have insert, but database will guarantee it'll be finished.

500
00:33:13,100 --> 00:33:15,770
But I already, it'll, it's not my problem already.

501
00:33:15,770 --> 00:33:20,900
I, I finished my transaction and another
transaction is happening in different backend.

502
00:33:20,900 --> 00:33:27,063
For example, you can do it with DB link or something
emulated, but it's kind of, it it looks not, clean solution.

503
00:33:27,565 --> 00:33:33,535
and, uh, like currently, I see what, like, ideal
system, what I would build, I would use, uh, tables.

504
00:33:34,043 --> 00:33:39,833
I would store definition, maybe not, not on pause,
maybe inside I would maybe be partition result.

505
00:33:39,863 --> 00:33:47,782
But first thing I would solve, of course, I would propagate, changes
incremental incre, incremental fashion, not, refreshing whole, view.

506
00:33:47,802 --> 00:33:50,422
And by the way, there are articles explaining how to do it.

507
00:33:50,755 --> 00:33:54,642
And there are also tools for various languages which, help you to do it.

508
00:33:55,062 --> 00:34:00,972
But what, first thing I would do, I would propagate changes
and synchronously through probably something like Kafka.

509
00:34:00,972 --> 00:34:01,542
I don't know.

510
00:34:01,712 --> 00:34:05,059
I can implement it in Postgres, but it requires additional knowledge.

511
00:34:05,064 --> 00:34:10,549
You need to know how to deal with blood
in, it's, it's kind of cue inside Postgres.

512
00:34:10,554 --> 00:34:13,219
It's, maybe we should cover this topic separately.

513
00:34:13,639 --> 00:34:18,899
It's possible, but if, if it's in Kafka, it's also
good or some other system which doesn't lose data.

514
00:34:19,360 --> 00:34:30,490
allows me to control the leg and it just delivers, uh, this,
uh, signal of change, uh, separately so I can update my target.

515
00:34:30,940 --> 00:34:35,830
And, uh, the project I, I sent to you, the link,
this materialized company called Materialized.

516
00:34:35,860 --> 00:34:41,287
They solve it as a commercial solution and
as I understand they use Kafka and dbi.

517
00:34:42,464 --> 00:34:45,274
Everyone can do it, but maintaining Kafka.

518
00:34:45,999 --> 00:34:47,649
Requires additional efforts.

519
00:34:48,129 --> 00:34:50,589
So that's why maybe I would do it in pogs itself.

520
00:34:50,589 --> 00:34:51,789
It depends on the system.

521
00:34:51,789 --> 00:34:59,642
If it's, uh, if you have many, many databases, having Kafka is
something like additional to pogs, it would be justified, right?

522
00:34:59,972 --> 00:35:02,582
But in this case, imagine like it's ideal world.

523
00:35:02,582 --> 00:35:03,902
The target is good.

524
00:35:04,202 --> 00:35:13,003
I mean, it's, it doesn't, suffer from having huge, uh,
tables and very, very long vacuuming indexes are usually.

525
00:35:13,003 --> 00:35:14,617
Smaller because it's partition.

526
00:35:15,020 --> 00:35:15,800
index maintenance.

527
00:35:15,800 --> 00:35:19,851
There can be applied in regular fashion, and changes are propagated.

528
00:35:19,851 --> 00:35:24,350
atomically, like every small change is
propagated, guaranteed with small delay.

529
00:35:24,851 --> 00:35:29,555
And, for example, we have a lot of posts, comments
like social media or for example, Twitter.

530
00:35:29,825 --> 00:35:30,455
There's master.

531
00:35:30,455 --> 00:35:32,040
Don't think this, like, people migrated.

532
00:35:32,380 --> 00:35:33,510
It's signing on post as.

533
00:35:33,915 --> 00:35:34,575
Michael: it is.

534
00:35:35,280 --> 00:35:38,840
Nikolay: We have, tweets, for example, answers, replies, likes, everything.

535
00:35:39,200 --> 00:35:43,970
And we can have counters very, very, very good here, like using this approach.

536
00:35:44,270 --> 00:35:52,639
Like, so I wonder why we go, development goes this direction,
which will not work well for very large volumes of data.

537
00:35:52,879 --> 00:35:59,719
Well, I, I see the use cases where it'll be
useful, this incremental human illness, but why?

538
00:36:00,604 --> 00:36:01,894
Two big problems ignored.

539
00:36:01,894 --> 00:36:06,654
And first of all, this, uh, ability to
propagate, uh, changes as synchronously.

540
00:36:06,654 --> 00:36:15,104
It's so like, I think this problem should be solved first to allow people
to define triggers, which will be, a synchronous, for example, or something.

541
00:36:15,764 --> 00:36:19,784
Or maybe even, maybe it's should be done outside of database.

542
00:36:19,784 --> 00:36:21,674
It's, it's not close topic for me.

543
00:36:21,674 --> 00:36:23,354
I don't know the ideal answer here.

544
00:36:23,354 --> 00:36:27,104
Maybe Kafka should be here or maybe it should be done at pogs.

545
00:36:27,134 --> 00:36:27,644
I don't know.

546
00:36:28,679 --> 00:36:34,044
Michael: Well, if anybody out there has done a lot of, , thinking
and reading about this, let us know what, what your thoughts are.

547
00:36:34,314 --> 00:36:38,664
Also, I'm really grateful to the people
that are working on this, on the P G I vm.

548
00:36:39,144 --> 00:36:41,454
Thing because they are looking to contribute this to Postgres.

549
00:36:41,454 --> 00:36:43,434
They are looking to, get it committed upstream.

550
00:36:43,434 --> 00:36:49,434
And I think that's a really nice use of the extension
model where you write an extension as a proof of concept.

551
00:36:49,494 --> 00:36:50,214
Well, what do you think?

552
00:36:50,979 --> 00:36:51,819
Nikolay: Yeah, I, I agree.

553
00:36:51,819 --> 00:36:52,149
I agree.

554
00:36:52,279 --> 00:36:57,789
I, I, I don't want to like to discourage, uh, to, just, to criticize.

555
00:36:57,789 --> 00:37:01,899
I just see that problems that should be solved also,

556
00:37:01,899 --> 00:37:07,239
like I, I solved it a couple of times in different cases, like not.

557
00:37:07,579 --> 00:37:09,439
Not as open source just solved it.

558
00:37:10,189 --> 00:37:11,479
And, uh, it was good.

559
00:37:11,809 --> 00:37:21,660
But, maybe I, I remember also discuss discussions, this implementing this,
autonomous prag autonomous in Postgres, unfortunately also didn't finish.

560
00:37:22,062 --> 00:37:24,812
so like the, the need of.

561
00:37:25,417 --> 00:37:35,287
Is there, uh, incremental human maintenance probably will be good for
mid-size projects, but for larger scale, like dozens of terabytes and,

562
00:37:35,287 --> 00:37:44,730
uh, hundreds of thousands of transactions per second, multiple standby
notes and so on, we do need, additional thing to, to be solved as well.

563
00:37:44,790 --> 00:37:48,293
I, I feel it like, and currently solvable with Kafka or something.

564
00:37:48,413 --> 00:37:51,713
Maybe it should be, maybe this like regular.

565
00:37:52,078 --> 00:37:56,068
Discussion, should it be inside the engine or should be outside?

566
00:37:56,458 --> 00:37:58,348
I don't have exact answer for this.

567
00:37:58,677 --> 00:38:00,339
but it's, it's super interesting topic.

568
00:38:00,572 --> 00:38:09,907
I think, in general, development of  use can, uh, be
helpful for many, many because, uh, data volumes are growing

569
00:38:09,907 --> 00:38:13,094
and, people want, uh, aggregates around faster and so on.

570
00:38:13,094 --> 00:38:13,184
And.

571
00:38:14,649 --> 00:38:15,189
Michael: Yeah, exactly.

572
00:38:15,319 --> 00:38:20,829
More and more products need to, you know, have a dashboard,
have some, you know, a lot of even transactional products are

573
00:38:20,829 --> 00:38:23,649
expected to do a little bit of analytics now as well, aren't they?

574
00:38:23,654 --> 00:38:25,269
So, makes a lot of sense.

575
00:38:25,764 --> 00:38:25,944
Nikolay: Yeah.

576
00:38:26,094 --> 00:38:27,984
Well actually maybe it's not that bad.

577
00:38:28,104 --> 00:38:29,574
If you like, I, I would maybe.

578
00:38:30,309 --> 00:38:33,339
Right now consider P G I V M for a particular project.

579
00:38:33,339 --> 00:38:40,989
I would just measure benchmark how my rights will slow
down because of this synchronous propagation approach.

580
00:38:41,259 --> 00:38:43,089
Also, it requires an index, right?

581
00:38:43,119 --> 00:38:51,219
So additional index, which also will slightly slow down our
rights, but, and if it's, if it's acceptable and if it'll remain.

582
00:38:51,759 --> 00:38:57,009
Acceptable in few years when my data
volumes will grow, then I will just use it.

583
00:38:57,099 --> 00:38:57,579
Why not?

584
00:38:57,579 --> 00:39:02,469
And uh, it's also a problem of, uh, this, uh, like very rough approach.

585
00:39:02,469 --> 00:39:04,299
Like, let's update as a whole.

586
00:39:05,199 --> 00:39:06,339
Updates only part of it.

587
00:39:06,339 --> 00:39:06,729
It's good.

588
00:39:07,449 --> 00:39:07,749
Michael: Yeah.

589
00:39:08,169 --> 00:39:10,659
Keeps everything in Postgres at at the start too.

590
00:39:11,859 --> 00:39:13,179
Well, good luck to them and thank you.

591
00:39:13,847 --> 00:39:15,337
Did we cover everything you wanted to hear?

592
00:39:15,811 --> 00:39:25,508
Nikolay: Yeah, maybe some bottom line, uh, check, benchmark and consider it
building your own solution based on just create, select, some triggers, maybe

593
00:39:25,508 --> 00:39:30,548
some, maybe kaf if you, or some synchronous queue mechanism you have already.

594
00:39:30,608 --> 00:39:33,188
If it, if you, if it's reliable, of course.

595
00:39:33,308 --> 00:39:33,578
And.

596
00:39:33,585 --> 00:39:38,775
You can propagate and support your own data structures and queries maybe.

597
00:39:39,108 --> 00:39:47,163
While, like, I mean, if existing solutions , don't suit well for
your needs, maybe you should, you just need to write your own thing.

598
00:39:47,506 --> 00:39:48,946
it's not that difficult actually.

599
00:39:49,320 --> 00:39:51,780
You have rules, you have triggers, so just do it.

600
00:39:51,780 --> 00:39:52,200
That's it.

601
00:39:52,980 --> 00:39:53,550
Triggers here.

602
00:39:53,710 --> 00:39:53,950
Yeah.

603
00:39:53,955 --> 00:39:54,030
Yeah.

604
00:39:54,030 --> 00:39:56,550
I think about triggers, but, uh, synchron.

605
00:39:57,140 --> 00:39:58,520
Nature would be good to have here.

606
00:39:59,210 --> 00:39:59,450
Yep.

607
00:39:59,540 --> 00:40:07,100
Listen, I, I would just to, just to warm like refresh,
massage you is good as like quick and dirty solution.

608
00:40:07,400 --> 00:40:09,860
It, it'll be dirty because it, it'll hurt.

609
00:40:09,920 --> 00:40:16,716
Uh, when data grows and everything grows and like,
it'll be problematic for large volumes of data, even

610
00:40:16,836 --> 00:40:21,186
Michael: But it, it can get people a long way right at the beginning.

611
00:40:21,845 --> 00:40:28,535
Nikolay: Well, yeah, I'm talking about, again, I'm talking about dozens of
terabytes and dozens or even hundreds of thousands of transactions per second.

612
00:40:28,535 --> 00:40:33,035
It's like really big systems and also motorized, motorized companies.

613
00:40:33,215 --> 00:40:34,295
They do interesting things.

614
00:40:34,295 --> 00:40:36,125
So it's, it's commercial, but interesting.

615
00:40:36,545 --> 00:40:39,445
And there, there is a talk, , there are a couple of talks from them.

616
00:40:39,445 --> 00:40:45,275
I, I've checked them Very interesting, , just to
understand how it's implemented, how the vision and so on.

617
00:40:45,335 --> 00:40:48,047
Uh, let's also link their, their materials

618
00:40:48,147 --> 00:40:49,527
Michael: Will do materials.

619
00:40:50,057 --> 00:40:50,377
Nikolay: materials.

620
00:40:50,797 --> 00:40:51,217
Michael: Yeah.

621
00:40:51,222 --> 00:40:51,337
Yeah.

622
00:40:52,927 --> 00:40:58,102
well, yeah, I hope a few people got some ideas of things
they can use these for as well, even in the current state.

623
00:40:58,222 --> 00:41:00,112
And yeah, look forward to speaking with you next week.

624
00:41:00,899 --> 00:41:01,389
Nikolay: Good.

625
00:41:01,389 --> 00:41:02,128
See you later.

626
00:41:02,128 --> 00:41:02,448
Bye.

627
00:41:02,449 --> 00:41:02,869
Michael: Bye.