1
00:00:00,049 --> 00:00:01,819
Michael: Hello and welcome to Postgres fm.

2
00:00:01,939 --> 00:00:03,530
A week show about all things Postgre.

3
00:00:03,799 --> 00:00:07,849
I'm Michael, founder of PG Mustard, and this
is my cohost Cola founder of Postgres ai.

4
00:00:08,179 --> 00:00:09,559
Hey, Nicola, what are we talking about?

5
00:00:10,441 --> 00:00:11,251
Nikolay: Hi Michael.

6
00:00:11,330 --> 00:00:14,792
Last time we discussed a versioning of database schema or.

7
00:00:15,297 --> 00:00:16,617
Database Migration management.

8
00:00:16,617 --> 00:00:18,793
I already forgot the proper name of it.

9
00:00:18,816 --> 00:00:32,592
We can continue this field and discuss database branching today and how
it is different from, like snap ting for example, or from , schema version

10
00:00:32,622 --> 00:00:37,834
control, like, cause it's, like adjacent area which is not yet developed.

11
00:00:37,864 --> 00:00:39,664
We don't have good tools yet, by the way.

12
00:00:39,669 --> 00:00:50,392
So,  we probably will discuss some ideas and concepts and what to expect
from the future in terms of various doing and what other companies do,

13
00:00:50,392 --> 00:00:56,687
and also what my company does because  looks like we go in this direction.

14
00:00:56,687 --> 00:00:59,657
We develop a database branching right now, so,

15
00:01:00,437 --> 00:01:01,277
Michael: Yeah, exactly.

16
00:01:01,277 --> 00:01:03,407
Depending on exactly how you define it.

17
00:01:03,407 --> 00:01:11,177
It seems like there's quite a few database companies at the moment talking
about branching, but each one means something slightly different, or as you

18
00:01:11,177 --> 00:01:19,313
dig into it, I've seen your conversation on Twitter with a few people trying
to understand what they mean by it and trying to get some definitions down.

19
00:01:19,336 --> 00:01:21,856
So I'm looking forward to hearing your thoughts around that.

20
00:01:21,856 --> 00:01:23,266
Is it, is it worth us?

21
00:01:23,296 --> 00:01:23,566
Oh yeah.

22
00:01:23,571 --> 00:01:30,351
So database lab engines worth discussing maybe in terms of what it
does what, what you are calling branching and what some  others are.

23
00:01:30,861 --> 00:01:32,061
Is that worth doing first?

24
00:01:32,571 --> 00:01:32,931
Nikolay: Yeah.

25
00:01:32,931 --> 00:01:41,152
Well, I think discussion of data base engine as a whole, it's
maybe a separate discussion because there are many things that

26
00:01:41,152 --> 00:01:45,938
it can do and many different use cases where it is useful.

27
00:01:46,328 --> 00:01:49,418
But briefly, I think it's a good idea and yeah.

28
00:01:49,538 --> 00:01:52,418
Let, let me do some overview of data.

29
00:01:52,908 --> 00:01:54,168
Branching topic.

30
00:01:54,569 --> 00:01:58,400
So database, lab engine, which we post develop.

31
00:01:58,430 --> 00:02:07,117
It was born when we needed to experiment very quickly to
check, first of all actually, sequel optimization ideas,

32
00:02:07,477 --> 00:02:14,865
not on production, but in some non-production environment,
which behaves where pogs behaves identically to production.

33
00:02:15,165 --> 00:02:17,235
And we needed to isolate.

34
00:02:17,344 --> 00:02:19,633
Experiments of different people.

35
00:02:19,633 --> 00:02:21,913
And we also needed to iterate.

36
00:02:22,123 --> 00:02:28,025
So to reset quickly and to throw out bad ideas switch to new ideas quickly.

37
00:02:28,205 --> 00:02:36,342
But when you build indexes during hours or you already change
your schema heavily changing data, sometimes you need like

38
00:02:36,342 --> 00:02:39,633
to spend many hours converting some column and then you.

39
00:02:40,496 --> 00:02:41,226
Uh, Dead end.

40
00:02:41,226 --> 00:02:42,546
You need to start from scratch.

41
00:02:42,546 --> 00:02:51,447
It's quite difficult usually to have another environment provisioned
quickly,  so we solved this originally for optimization using

42
00:02:51,510 --> 00:02:59,590
think cloning, think provisioning of based on Zetas, either
Zetas or lvm, although other options also possible to implement.

43
00:03:00,475 --> 00:03:06,235
Without any big details, like you can run single server with dozens of.

44
00:03:06,342 --> 00:03:13,410
Independent logically independent post instances where
database is the same everywhere, but it's writeable.

45
00:03:13,415 --> 00:03:19,560
So, so you can deviate, you can create your own index,
and, and the planner behaves is exactly as own production.

46
00:03:19,650 --> 00:03:26,730
This is the trick and the creation of CL takes
only a few seconds regardless of database size.

47
00:03:26,880 --> 00:03:31,660
Sounds like magic, but this magic is going to kill another.

48
00:03:31,790 --> 00:03:35,255
I call it like many people call it black magic posts.

49
00:03:35,260 --> 00:03:39,525
DBA knowledge is and skills is like area of black magic.

50
00:03:39,530 --> 00:03:45,195
You need like to spend 15 years and then you
quickly say, This will work, this won't work.

51
00:03:45,975 --> 00:03:48,435
And people say, Oh, you are like black magic guy.

52
00:03:48,465 --> 00:03:50,805
Yes, black magic means like something is hidden.

53
00:03:51,375 --> 00:03:53,475
Our magic is is wide magic.

54
00:03:53,685 --> 00:03:56,455
No, nothing is hidden in any developer, any engineer.

55
00:03:57,090 --> 00:04:01,680
See behavior not disturbing others and experiment and fail and so on.

56
00:04:02,010 --> 00:04:03,210
So this is what we did.

57
00:04:03,960 --> 00:04:06,180
Many clones are running on single machines.

58
00:04:06,180 --> 00:04:09,352
So you pay for one machine and have dozens of clones.

59
00:04:09,377 --> 00:04:13,191
And we switched them to area of testing in C I C D pipelines.

60
00:04:13,401 --> 00:04:18,675
So like it's whole new world as as well, like, again, whole big topic how to.

61
00:04:18,675 --> 00:04:23,188
What, can be tested in C I C D pipelines
in terms of database, in terms of Postgres?

62
00:04:23,200 --> 00:04:31,883
Our idea is obviously we have database finally, not tiny, small, like one
gigabyte of something just generated, or, I don't know, brought to you by.

63
00:04:31,961 --> 00:04:34,571
Docker pool, but we have whole database.

64
00:04:34,571 --> 00:04:37,421
Like we can, can be handed gigabytes or terabytes.

65
00:04:37,451 --> 00:04:39,281
It's, it does, it doesn't matter for us.

66
00:04:39,311 --> 00:04:44,664
We can just set it up and, and make pipelines
working, provisioned very quickly in, in few seconds.

67
00:04:44,904 --> 00:04:46,704
So testing is another area.

68
00:04:46,914 --> 00:04:48,234
Some people do various things.

69
00:04:48,234 --> 00:04:52,608
For example, some people just test ppg
upgrade inside our containers also possible.

70
00:04:52,608 --> 00:04:58,341
And so The key here is that we already do it
for a couple of years or maybe three years.

71
00:04:58,641 --> 00:05:01,791
We always said this is like, think cloning.

72
00:05:01,971 --> 00:05:04,611
We use the term cloning, Clon cl clone.

73
00:05:04,821 --> 00:05:12,630
It's very natural for cloud engineers, for dbs, for
like SRE because  This is cloning term used in cloud.

74
00:05:13,185 --> 00:05:13,455
Right.

75
00:05:13,485 --> 00:05:20,556
You can clone your S volume also from snapshots, so copy right there as well.

76
00:05:20,556 --> 00:05:26,886
It's also like thin provision, but thin provisioning, but it's
kind of different because you pay for each volume separately.

77
00:05:27,336 --> 00:05:30,606
But still cloning is used there or RDS clones.

78
00:05:30,936 --> 00:05:36,823
Aurora has also think loans and so you can
single storage, but multiple instances.

79
00:05:37,723 --> 00:05:43,661
Using all of them, use this same storage, and you can have multiple write.

80
00:05:43,729 --> 00:05:44,329
Instances.

81
00:05:44,509 --> 00:05:50,569
So you can do rights independently similar, but again,
you, you need to pay for each compute node separately.

82
00:05:51,079 --> 00:06:00,006
That's why no, neither RDS clones nor of ro clones are
good for testing in i c d pipelines because you, you want.

83
00:06:00,296 --> 00:06:04,924
Constant price, constant cost when you need to pay.

84
00:06:05,284 --> 00:06:08,554
Also, by the way, you need to wait many minutes to provision them both.

85
00:06:08,912 --> 00:06:15,961
Michael: I guess thin cloning is specifically named as opposed
to thick cloning as it, you could take a full copy of it.

86
00:06:16,021 --> 00:06:19,951
And I, and that's what a lot of systems have offered for a long time.

87
00:06:20,028 --> 00:06:26,628
And this, this is obviously a step above that in terms of perform, in
terms of speed, but also in terms of not having to have that extra,

88
00:06:26,943 --> 00:06:27,873
Nikolay: thick cloning.

89
00:06:27,873 --> 00:06:35,013
Actually, we can devote a whole episode to it, and maybe we
should, because it's also interesting, an interesting question,

90
00:06:35,013 --> 00:06:39,143
how to clone a large database using regular tools, for example.

91
00:06:39,393 --> 00:06:40,183
Do, do you.

92
00:06:40,713 --> 00:06:45,319
Clone P data directory or use P based backup at physical level, right?

93
00:06:45,619 --> 00:06:47,809
How to do it live without interruptions.

94
00:06:47,839 --> 00:06:51,153
Of course, any experience post DBA knows the answer.

95
00:06:51,363 --> 00:06:58,653
PPG startup backup or just use PPG based backup by
default, it will be okay and you can do it live or you

96
00:06:58,683 --> 00:07:02,865
can you use it at logical level using pizza dump store.

97
00:07:03,630 --> 00:07:07,027
It has questions, how to speed it up and so on.

98
00:07:07,177 --> 00:07:15,397
But roughly we can assume that a regular, both of them, by the way, we
consider thick cloning, but we distinguish physical and logical levels.

99
00:07:15,787 --> 00:07:18,577
So that restore is also cloning, but at logical level.

100
00:07:18,667 --> 00:07:21,665
But you can choose which objects to clone there, right?

101
00:07:22,202 --> 00:07:24,602
And you can speed it up using dash.

102
00:07:25,558 --> 00:07:30,568
But in this case, you need additional space
because you cannot use that j and do it on the.

103
00:07:31,389 --> 00:07:36,689
The problem, which is solved by another tool,
Demetri Fontain, is developing pg copy db.

104
00:07:36,689 --> 00:07:45,119
I, maybe I'm wrong with name of the tool, but it's quite new and it exactly
provides your ability at logical level to use multiple threads and avoid.

105
00:07:45,389 --> 00:07:49,589
Intermediate file backup file so you can do it on the fly.

106
00:07:49,589 --> 00:07:50,949
It's, that's interesting.

107
00:07:51,089 --> 00:07:55,769
But it also raises question how long our transactions are on the source.

108
00:07:55,859 --> 00:07:59,189
Many, many, many things in the area of thick glowing.

109
00:07:59,609 --> 00:07:59,939
Why?

110
00:08:00,359 --> 00:08:01,469
I know it very well.

111
00:08:01,474 --> 00:08:07,549
Because to provision database, lab engine, we need first
to get data in a regular way, either logical or physical.

112
00:08:08,129 --> 00:08:08,969
So right

113
00:08:09,189 --> 00:08:13,279
Michael: like one thick clone that you can
base the thin clones off, but that's where I

114
00:08:13,334 --> 00:08:20,302
Nikolay: And we need to main, we need to also to maintain it
either continuously or to do the full refresh on schedule.

115
00:08:20,572 --> 00:08:20,992
Everything

116
00:08:21,112 --> 00:08:22,072
Michael: nightly or something.

117
00:08:22,402 --> 00:08:22,732
Yeah.

118
00:08:23,152 --> 00:08:30,472
So, but that's where this becomes really useful, I think, for the
branching discussion because suddenly if we can do thing clones

119
00:08:30,472 --> 00:08:38,186
or something like them, we get the concept of maybe you can have
branches that aren't just empty, that aren't just just the schema.

120
00:08:38,186 --> 00:08:41,396
They can have real data behind them as well.

121
00:08:41,766 --> 00:08:44,346
Nikolay: So what happened with branching, the branching term?

122
00:08:44,766 --> 00:08:49,596
First of all, I, I didn't realize it in the past, but now I see it very well.

123
00:08:49,709 --> 00:08:52,970
Cloning is very infrastructure, language.

124
00:08:53,037 --> 00:08:59,210
It's not friendly To developers because in
there is lon, but it's kind of different.

125
00:08:59,210 --> 00:09:00,500
You clon whole repository.

126
00:09:01,070 --> 00:09:03,350
There you have revisions.

127
00:09:04,335 --> 00:09:05,435
Or commit number.

128
00:09:05,915 --> 00:09:07,745
Commit numbers and branches.

129
00:09:07,798 --> 00:09:08,488
And clones.

130
00:09:08,488 --> 00:09:19,137
It's a language of SRE people or DBAs, DBAs infrastructure people,
of course all any engineer knows, like clone in various aspects, but

131
00:09:19,137 --> 00:09:29,515
still they prefer branching and suddenly some time ago planet scale,
which originally provides charting my sequel vis, they are develop,

132
00:09:29,545 --> 00:09:39,934
they develop, vis and founders are the same, who created vis,  it
looked like my per, from my perspective, suddenly to the tohar problem.

133
00:09:39,939 --> 00:09:44,595
They added a schema management capabilities and they called it, Okay.

134
00:09:44,595 --> 00:09:52,009
We have now database branching and we have zero
downtime deployments for ischemic changes, hassle free.

135
00:09:52,009 --> 00:09:53,119
So like, great.

136
00:09:53,922 --> 00:10:00,702
And on the front page it was like, it was like maybe last year
in 2021 and on the front page I saw that it was branching.

137
00:10:00,702 --> 00:10:04,452
But when I clicked inside documentation, I was curious because it's our area.

138
00:10:04,452 --> 00:10:06,012
I felt like, Okay, okay.

139
00:10:06,282 --> 00:10:10,252
Do they play with think loans or what Think provisioning.

140
00:10:10,252 --> 00:10:14,605
But inside you can see database schema branching already.

141
00:10:14,665 --> 00:10:17,185
So slightly different term.

142
00:10:17,185 --> 00:10:17,425
Right.

143
00:10:17,425 --> 00:10:20,005
And you realize that they clon only sche.

144
00:10:20,605 --> 00:10:21,745
, then you can change it.

145
00:10:21,745 --> 00:10:27,973
They, then they produce diff uh, we some of topics of our previous episode.

146
00:10:28,843 --> 00:10:33,429
And this diff you can see it, you can approve it, other people can approve it.

147
00:10:33,429 --> 00:10:34,678
So there is some flow.

148
00:10:34,680 --> 00:10:37,710
And then it's deployed in zero downtime fashion,

149
00:10:39,135 --> 00:10:39,345
Michael: Yeah.

150
00:10:39,420 --> 00:10:40,710
Nikolay: and nothing about data.

151
00:10:41,498 --> 00:10:46,718
Michael: So that, that raises the question about test,
like how do you test on the performance side of things, You

152
00:10:46,718 --> 00:10:49,898
know, is this all and how do multiple people work together?

153
00:10:49,898 --> 00:10:50,558
That kind of thing.

154
00:10:51,188 --> 00:10:51,698
Nikolay: Right.

155
00:10:51,758 --> 00:10:59,412
Well, first of all, this is also viable approach, I, I should
admit, because this is hard problem, zero down time migrations.

156
00:10:59,417 --> 00:11:06,834
By the way, last time we didn't mention my article I dunno,
Like 18 mistakes of making schema changes MPOs before measuring

157
00:11:06,834 --> 00:11:13,014
this article, I selected, there are many types of mistakes
you can do, but I selected some and discussed in detail.

158
00:11:13,075 --> 00:11:15,687
It's on, on our website  ai.

159
00:11:16,137 --> 00:11:18,663
So it's a good problem to solve.

160
00:11:18,663 --> 00:11:25,503
Very, actually hard because most of diff tools we see,
and I, maybe I'm wrong, but liquid base has diff tool

161
00:11:25,508 --> 00:11:28,476
we mentioned and some, some, somebody in comments.

162
00:11:28,476 --> 00:11:29,316
Thank you so much.

163
00:11:29,338 --> 00:11:38,732
On YouTube raise this liquid base also has Diff and Jein has
diff, There are separate projects like Mira, but all of them show.

164
00:11:38,832 --> 00:11:45,792
Simple diff like create index without word, concurrently,
not discussing the problem, how to change data, type

165
00:11:45,792 --> 00:11:50,042
in, in 1 billion row table so they solve hard problem.

166
00:11:50,047 --> 00:11:54,002
But there is hardest, hardest problem, how to generate diff.

167
00:11:54,437 --> 00:12:02,555
In zero downtime fashion, as far as I understand, a planet base
they show diff regular in regular form, but when they apply

168
00:12:02,555 --> 00:12:06,722
changes, they perform something like similar to PPG Repak approach.

169
00:12:06,722 --> 00:12:11,242
When you create full copy of table recording all changes.

170
00:12:12,032 --> 00:12:15,302
In some delta table, like change lock, right?

171
00:12:15,572 --> 00:12:21,539
And then in single transaction, or in multiple transactions,
it's interesting also topic, but in steps you apply

172
00:12:21,544 --> 00:12:25,139
all changes and then you already switch to new table.

173
00:12:25,769 --> 00:12:31,973
Of course, this approach requires some disc space
and it's kind of too heavy for small changes.

174
00:12:31,973 --> 00:12:36,983
Sometimes, like it, it depends, But it's
interesting that they have full automation.

175
00:12:38,143 --> 00:12:46,033
But again, they don't care about data in this case, but their CEO
in in Twitter discussion said they are working on data branching.

176
00:12:46,783 --> 00:12:50,413
I'm very curious how they will solve the terminology problem.

177
00:12:50,413 --> 00:12:53,563
You know, two biggest problems in computer science, Right?

178
00:12:53,803 --> 00:12:55,843
Naming and cash.

179
00:12:55,933 --> 00:12:56,983
Cash and validation.

180
00:12:56,983 --> 00:12:57,283
Right?

181
00:12:57,553 --> 00:12:57,913
So

182
00:12:57,928 --> 00:12:59,158
Michael: and off by one errors.

183
00:13:00,253 --> 00:13:00,613
Nikolay: Yeah.

184
00:13:00,673 --> 00:13:00,853
Yeah.

185
00:13:01,663 --> 00:13:06,693
, So obviously they have naming issue because they
already use database branching for sche only.

186
00:13:07,878 --> 00:13:14,356
Then this year earlier I saw that super base have database
branching in their roadmap and also neon appeared.

187
00:13:14,356 --> 00:13:17,934
And Neon said We are open source Aurora, right?

188
00:13:18,334 --> 00:13:18,754
Michael: Yep.

189
00:13:19,044 --> 00:13:23,274
Nikolay: Aurora has think loaning, which
is, in my opinion, not good for testing.

190
00:13:23,484 --> 00:13:24,915
It's too expensive and it's too.

191
00:13:26,109 --> 00:13:28,659
It, they, it's like think cloning.

192
00:13:28,659 --> 00:13:33,399
Yes, but you need to wait minutes and you need
to pay for compute for each clone separately.

193
00:13:33,399 --> 00:13:41,604
So it's, if you use big O notation in terms of money,
it's a big O from number of clones for compute power.

194
00:13:41,784 --> 00:13:47,884
Fortunately not for storage, but also of course theora also charges you for a.

195
00:13:48,384 --> 00:13:54,936
And for testing, also not very pleasant, but
I guess big enterprises are okay with this,

196
00:13:55,086 --> 00:13:56,916
Michael: it's, boy, it's better than nothing as well, right?

197
00:13:57,006 --> 00:13:58,246
It's better than not having it.

198
00:13:58,606 --> 00:13:59,466
Nikolay: Of course you can.

199
00:14:00,036 --> 00:14:07,852
Yeah, you can test some heavy changes in this way, but
it, this is not something you will use for each pull

200
00:14:07,852 --> 00:14:10,852
request or merge request of back end call changes.

201
00:14:10,852 --> 00:14:11,985
It's too, too much.

202
00:14:12,975 --> 00:14:16,947
But I also, by the way, I found that observed this area.

203
00:14:18,177 --> 00:14:21,721
Heavy clones where we can use all CPUs and so on.

204
00:14:21,721 --> 00:14:25,681
They are needed only infrequently for infrastructure teams.

205
00:14:25,686 --> 00:14:30,541
For example, upgrades, big migrations to
some new hardware or operation system.

206
00:14:30,991 --> 00:14:35,370
But developers these days, they do changes
many times per day, sometimes, right?

207
00:14:35,370 --> 00:14:36,780
So like it's very often.

208
00:14:37,050 --> 00:14:41,040
And there we need the very, very cheap and fast cloning.

209
00:14:41,769 --> 00:14:43,619
So, okay, back to database branching.

210
00:14:43,859 --> 00:14:53,763
So a new one in the very beginning said we are going also to be very
good database for C I C D pipelines and we have database branching, not

211
00:14:53,857 --> 00:15:02,707
discussing what it means actually, like, in detail, how, how it is, how
is it different from cloning, for example, or snapshoting or like these

212
00:15:02,917 --> 00:15:13,744
infrastructure languages or terms and Someone else also said, like some other
projects also said like, we have gi like approach for databases, for pogs.

213
00:15:14,554 --> 00:15:26,394
And then like I spent some time trying to realize how branching could, behave
for, for database for POGS to solve problems of development and testing.

214
00:15:26,694 --> 00:15:30,694
And finally I realized that branches are very different from our.

215
00:15:31,534 --> 00:15:36,860
Because clones they take some memory, they consume memory.

216
00:15:37,670 --> 00:15:45,853
For example, I, I want, in GI I can have thousands of
branches and nobody, like I don't pay for it extra.

217
00:15:45,943 --> 00:15:46,183
Okay?

218
00:15:46,183 --> 00:15:49,498
Some, some small storage penalty that.

219
00:15:50,353 --> 00:15:50,593
Right.

220
00:15:50,713 --> 00:15:54,804
But if, if all branches are identical I don't pay at all.

221
00:15:55,404 --> 00:15:55,974
No, Nothing.

222
00:15:56,124 --> 00:15:56,484
Nothing.

223
00:15:56,484 --> 00:15:56,754
Right.

224
00:15:57,114 --> 00:16:04,175
But when you run a think loan and database engine,
it consumes some memory because it has shed buffered.

225
00:16:04,235 --> 00:16:06,845
So it's like it has something pogs running.

226
00:16:06,845 --> 00:16:08,165
So it, it has some cost.

227
00:16:08,195 --> 00:16:15,385
We have some limit, of course defined by size of your memory
on, on the server and and she buffers, for example, right?

228
00:16:15,385 --> 00:16:20,495
So we can adjust and run more clones, but
still we have some limit for branches.

229
00:16:20,495 --> 00:16:22,415
We don't want to have limit, right?

230
00:16:22,838 --> 00:16:28,935
This is one thing, Of course, name is
also a thing, but also a thing like in gi.

231
00:16:29,893 --> 00:16:30,643
It's very good.

232
00:16:30,643 --> 00:16:33,970
We discussed it like decentralized and one of properties of gi.

233
00:16:33,970 --> 00:16:36,760
It allows you several stages of review.

234
00:16:36,880 --> 00:16:40,420
You can review yourself before you push your commits.

235
00:16:40,470 --> 00:16:45,183
You can ask your colleagues to review if you
have full request me request and GitHub GitLab.

236
00:16:45,313 --> 00:16:51,373
So to see difference between branches bef before you
merge your development branch to your main branch.

237
00:16:52,004 --> 00:16:54,140
But if we say our CL.

238
00:16:55,056 --> 00:16:55,776
Branches.

239
00:16:55,776 --> 00:16:59,406
We cannot do it because we, we want to do this.

240
00:16:59,406 --> 00:17:09,410
We want to say this is our state, and then say multiple colleagues or
multiple CI pipelines tested, check it and continue working with it.

241
00:17:09,410 --> 00:17:16,375
For example, so we obviously we realized we need it, It
was in our roadmap for a while, but we realized branching

242
00:17:16,525 --> 00:17:20,094
looks like snapshotting on demand for your clones.

243
00:17:20,214 --> 00:17:21,984
So you have your clone, you change something.

244
00:17:22,494 --> 00:17:26,589
You put snapshot via api, cli, or ui, We have all of them.

245
00:17:27,052 --> 00:17:31,372
And then you say, Okay, this is, this is
Snapshot, this is commit, or this is it.

246
00:17:31,712 --> 00:17:32,732
Continue working with it.

247
00:17:33,247 --> 00:17:33,517
Right?

248
00:17:33,997 --> 00:17:37,177
So Snapshot started to look like a branching, right.

249
00:17:37,327 --> 00:17:39,157
And like kind of a name snapshot.

250
00:17:39,277 --> 00:17:40,987
And then I, you know what I did?

251
00:17:41,137 --> 00:17:42,307
I just opened a.

252
00:17:42,333 --> 00:17:45,506
Documentation and started to read it from scratch.

253
00:17:45,836 --> 00:17:50,786
And they say they have, by the way, slightly
conflicting definitions of branching.

254
00:17:50,810 --> 00:18:00,143
There is no good definition, like in the beginning, you need to go deeper, but
I found good definition Branch is a pointer to commit a named winner to commit

255
00:18:00,592 --> 00:18:06,232
well, There are issues with this term, but it's kind of also works for us.

256
00:18:06,262 --> 00:18:08,902
We say, Okay, we have a named pointer to commit.

257
00:18:09,202 --> 00:18:10,042
It can shift.

258
00:18:10,252 --> 00:18:12,772
If new commit is created, it shifts automatically.

259
00:18:13,429 --> 00:18:15,619
This, this kind of branch and that's it.

260
00:18:15,679 --> 00:18:22,921
And we already developed prototype at I think it'll
be database lab engine 4.0 when it, it'll be released.

261
00:18:22,926 --> 00:18:28,969
We are not in hurry, so we want everything work
very smoothly and tested by many teams properly.

262
00:18:29,209 --> 00:18:31,240
But it's already, we have prototype, it's working.

263
00:18:31,240 --> 00:18:32,941
At CLI level so far, not ui.

264
00:18:33,631 --> 00:18:40,836
So you can maybe if you listen like one month earlier,
later, we already have we should have UI as well and so on.

265
00:18:41,196 --> 00:18:48,481
But you say like, I want Branch, so you, you
started to deviate, you run, clone for this branch.

266
00:18:48,901 --> 00:18:50,251
Others can run their clones.

267
00:18:50,901 --> 00:18:53,031
, Clone is like your working directory.

268
00:18:53,211 --> 00:19:01,245
You just grabbed the content of your code base and
opened some ID or editors and started to change it.

269
00:19:01,605 --> 00:19:06,525
So clones is a, like, it's a mean means to change the state.

270
00:19:07,950 --> 00:19:09,840
Michael: Well, it's like a running application, isn't it?

271
00:19:09,840 --> 00:19:12,840
Like the source code isn't like a running application.

272
00:19:13,245 --> 00:19:13,635
Nikolay: Right.

273
00:19:13,635 --> 00:19:19,402
But usually running application doesn't mean like
changing of schema, but I, I usually avoid it.

274
00:19:19,428 --> 00:19:21,861
Changing of schema should be during deployment, not.

275
00:19:22,171 --> 00:19:23,431
During normal work.

276
00:19:23,641 --> 00:19:28,801
Some people do it actually some evolution of schema initiated by your users.

277
00:19:28,848 --> 00:19:35,778
Also temporary tables is a part of it, but I consider
it as like very questionable practice leading to

278
00:19:35,808 --> 00:19:39,153
issues with management from  DBA point of view.

279
00:19:39,363 --> 00:19:44,813
So I would say, Normal run, run of running
applications should not change your sche.

280
00:19:45,217 --> 00:19:46,477
You should try to avoid it,

281
00:19:47,107 --> 00:19:47,407
but

282
00:19:47,788 --> 00:19:51,193
developer openside it or changes it and then commits.

283
00:19:51,433 --> 00:19:53,083
Get, get, get push.

284
00:19:53,419 --> 00:19:53,749
Right?

285
00:19:54,404 --> 00:19:58,406
Michael: What I meant more is I really like the snapshot analogy, and I think.

286
00:19:58,647 --> 00:20:04,797
The code at a specific commit is kind of like a snapshot of
the application, but it's not the application running and

287
00:20:04,797 --> 00:20:08,727
the, and in the same way a clone is a running database, right?

288
00:20:08,727 --> 00:20:14,637
That you, that you can create from a snapshot maybe,
or from, I don't know quite how you're defining these

289
00:20:14,637 --> 00:20:17,337
things, but yeah, we don't need them running all the time.

290
00:20:17,367 --> 00:20:22,107
Just like a developer doesn't need an application, like
they're a version of application running all the time.

291
00:20:22,957 --> 00:20:27,367
Just, just while we are debugging something,
just while we are actually trying to, to test it.

292
00:20:27,727 --> 00:20:28,027
So yeah.

293
00:20:28,057 --> 00:20:29,137
Makes a load of sense.

294
00:20:29,377 --> 00:20:29,947
In theory,

295
00:20:30,782 --> 00:20:31,022
Nikolay: Yeah.

296
00:20:31,027 --> 00:20:37,311
I wanted to to emphasize also that we
consider these snapshots as a whole with data.

297
00:20:37,695 --> 00:20:40,035
It can, can be production data if you can afford it.

298
00:20:40,365 --> 00:20:46,045
There are no, if there are no issues with pi i
g d and others,  but we focus on schema changes.

299
00:20:46,045 --> 00:20:54,655
So data, We can, we also snapshot it and we
provide it to CLS or branches, snapshots and so on.

300
00:20:54,685 --> 00:20:58,975
But what is most meaningful is schema changes, right?

301
00:20:58,975 --> 00:21:01,960
Because these should be deployed sometimes.

302
00:21:02,011 --> 00:21:06,871
Of course data also should be deployed, but we want like gi like approach.

303
00:21:06,924 --> 00:21:10,067
With data, but applied to schema only.

304
00:21:10,127 --> 00:21:17,518
We don't want to  have data comparison and when deploy this,
maybe we will want it as, as well, because we have it here, right?

305
00:21:17,668 --> 00:21:19,198
We can do something here as well.

306
00:21:19,498 --> 00:21:22,496
But so far I'm like, I'm looking at the problems we have.

307
00:21:22,676 --> 00:21:26,557
We just want to mirror the capabilities of GE and bring.

308
00:21:26,621 --> 00:21:35,115
Branches, database branches to build very like effortless
effortlessly build non-production environments matching your code.

309
00:21:35,115 --> 00:21:39,273
So you have development, branch code, and
you have  development branching database.

310
00:21:39,483 --> 00:21:48,442
So you can quickly take this code somewhere on your laptop or
on your some Like, I dunno, non-production machine in cloud.

311
00:21:48,682 --> 00:21:53,226
And then you can request a clone for this branch, latest snapshot.

312
00:21:53,226 --> 00:21:59,736
And this branch will be used and we have posts running and you
can, they go together and you can start testing, developing

313
00:21:59,741 --> 00:22:02,628
your application and see how it works with a lot of data.

314
00:22:03,107 --> 00:22:04,757
Similar to production, this is what we do.

315
00:22:05,027 --> 00:22:14,487
But if you do some changes, data changes, we think they are not
such, so relevant because some tester can do many weird things

316
00:22:14,487 --> 00:22:18,027
with data there, , then we just need to, to throw it away.

317
00:22:18,057 --> 00:22:22,507
So when we meet, we meet fully, but we look mostly on s.

318
00:22:22,636 --> 00:22:31,968
And so far we relied that we use one of these tools we criticized
last, last week Soke, liquid base, Flyway rail migrations.

319
00:22:32,178 --> 00:22:37,398
We, we see that people already use them, so we are
not going to, so there is something to help there.

320
00:22:37,403 --> 00:22:38,868
We discussed problems.

321
00:22:38,988 --> 00:22:41,945
They have new generation I'm sure will be born.

322
00:22:42,365 --> 00:22:43,535
The nearest future, I think.

323
00:22:43,865 --> 00:22:48,714
But what's really not solved is how to test it properly with a lot of data.

324
00:22:49,251 --> 00:22:50,991
Here we have this branching.

325
00:22:51,171 --> 00:22:54,533
So I just described some maybe not very well.

326
00:22:54,533 --> 00:22:58,203
I described it because still it's kind of not very precise.

327
00:22:58,208 --> 00:22:58,833
This concept.

328
00:22:58,833 --> 00:23:01,297
It's already clear, but not super clear.

329
00:23:01,777 --> 00:23:05,772
But what I'm trying to do here is try to define what the best branching.

330
00:23:06,534 --> 00:23:08,004
And this is how we see it.

331
00:23:08,004 --> 00:23:10,148
We already developing in this direction.

332
00:23:10,154 --> 00:23:13,484
I'm curious what other companies think actually.

333
00:23:13,814 --> 00:23:21,582
So, but I think it would be good to, to synchronize thoughts and
to move in similar directions because in this case everyone wins

334
00:23:21,582 --> 00:23:24,665
and developers Have similar concepts in various products, right?

335
00:23:24,905 --> 00:23:30,571
Actually in source control management systems   snapshots,
clones and so on, like branches, they also.

336
00:23:31,335 --> 00:23:34,569
Have different meanings, if you compare detail.

337
00:23:34,629 --> 00:23:35,987
They have  differences.

338
00:23:36,257 --> 00:23:39,557
So pro probably here it'll happen as well.

339
00:23:40,007 --> 00:23:43,247
Database branching  can have different meanings in different tools.

340
00:23:43,405 --> 00:23:44,285
obviously, Right.

341
00:23:44,823 --> 00:23:46,113
Sorry, I'm talking too much.

342
00:23:46,113 --> 00:23:46,523
yeah.

343
00:23:46,950 --> 00:23:49,920
Michael: this is great, and I think you're working on a document, right?

344
00:23:49,950 --> 00:23:52,650
Are you gonna share that on Twitter when you're ready or, Yeah.

345
00:23:53,150 --> 00:23:56,297
Nikolay: Yeah, I have some draft for rfc in this area.

346
00:23:56,329 --> 00:24:04,270
Discussing goals and anti antigo because for example,
we want, to deal with data, deep data and so on.

347
00:24:04,510 --> 00:24:09,084
We focused on schema changes mostly because
data production has probably different data.

348
00:24:09,174 --> 00:24:11,743
For example, we don't want to release data patches.

349
00:24:11,743 --> 00:24:12,581
And also there is.

350
00:24:13,029 --> 00:24:13,779
There is a problem.

351
00:24:14,089 --> 00:24:14,349
Interesting problem.

352
00:24:14,349 --> 00:24:15,779
You created a column.

353
00:24:16,289 --> 00:24:16,859
I'm very sorry.

354
00:24:16,859 --> 00:24:23,601
I feel very sorry all the time because I, I'm, we have delay
probably, and you trying to interrupt me, but I'm already switched.

355
00:24:23,715 --> 00:24:25,455
You wanted to ask something, Sorry.

356
00:24:25,881 --> 00:24:27,381
Michael: I was just gonna add on the data front.

357
00:24:27,381 --> 00:24:35,031
I think I added it last week, but some, we found when I was doing this
before, we found sometimes data is schema, like the lookup tables.

358
00:24:35,031 --> 00:24:38,151
So you might need to worry about that, but yeah, exactly.

359
00:24:38,541 --> 00:24:44,182
So some, you might have to worry about that at some point, but it
wasn't like we, I think we added it in version two or something.

360
00:24:44,182 --> 00:24:48,896
So it, definitely can get away with not having it in there or at.

361
00:24:49,566 --> 00:24:51,526
It does feel like it's, It is there.

362
00:24:51,526 --> 00:24:52,246
It is important.

363
00:24:52,657 --> 00:24:53,188
Nikolay: Right.

364
00:24:53,293 --> 00:25:01,359
I spent some time trying to also to think about how to merge Merch
is, is basically, it's deploy, if you consider main is what should

365
00:25:01,359 --> 00:25:04,389
be done and should be present on our production environments.

366
00:25:04,629 --> 00:25:08,497
So merch is some you need to have diff and then you need to go.

367
00:25:08,497 --> 00:25:10,867
If his D to production, roll it out.

368
00:25:11,587 --> 00:25:13,397
But, but, but Uhhuh.

369
00:25:13,582 --> 00:25:14,842
Michael: A diff is slightly different.

370
00:25:14,842 --> 00:25:21,442
Isn't like diff is like compare these two things, but then I need a script,
like a, in order to actually make one of them the same as the other.

371
00:25:21,447 --> 00:25:23,902
And that's not the same as a diff, but yeah, I agree.

372
00:25:23,902 --> 00:25:26,152
So first step is diff, and then Second Step is a

373
00:25:26,452 --> 00:25:30,521
Nikolay: Well diff can be seen as a series of DDL comments.

374
00:25:31,001 --> 00:25:40,728
In this case, it's, it's same, but the problem is that you see it
these dd well diff tools for post schema, they show alter comment.

375
00:25:41,093 --> 00:25:42,683
created these comments.

376
00:25:42,743 --> 00:25:45,383
Michael: Yeah, most of the I, But I consider that a second feature, right?

377
00:25:45,383 --> 00:25:50,113
The first feature is compare these two schemers
and they, they'll like highlight differences and

378
00:25:50,228 --> 00:25:50,738
Nikolay: Well, maybe.

379
00:25:50,798 --> 00:25:51,038
Yeah.

380
00:25:51,043 --> 00:25:51,248
Yeah.

381
00:25:51,248 --> 00:25:53,588
So we have two approaches for diff I agree.

382
00:25:53,588 --> 00:26:00,954
But the problem with deployment will be if you go, if you're
creating this comment to, to production, you will block people.

383
00:26:01,194 --> 00:26:05,764
And so you need to, have advanced diff and we
spend some time prototyping this as well, and.

384
00:26:05,814 --> 00:26:10,464
Then I realized it's already kind of solved, like it's not solved well.

385
00:26:10,764 --> 00:26:19,131
But we have a zoo of various tools for deployment
management, like ski rub rails, migrations, and they all

386
00:26:19,131 --> 00:26:23,300
ignore the fact that data should be changed in batches.

387
00:26:23,510 --> 00:26:26,889
Again, I'm advertising GitLab migration help.

388
00:26:28,039 --> 00:26:37,718
So which solves this very well for Ruby, and like, it's hard for us to like,
either we need to choose something or we need to somehow abstract obstruction.

389
00:26:37,748 --> 00:26:40,748
Like it's, and then I realized, okay, people already solve this somehow.

390
00:26:41,078 --> 00:26:42,712
Let's just avoid this problem.

391
00:26:42,712 --> 00:26:49,261
And we, I consider this, currently consider this
as antigo and we just take care of conflicts.

392
00:26:49,266 --> 00:26:53,311
So if someone already changed schema in this branch, you trying to.

393
00:26:54,286 --> 00:26:59,717
So in other words, we have something like CVS or Subversion, very centralized.

394
00:27:00,118 --> 00:27:06,805
So before you put your changes, you need to update
and then you can, you can already push the changes.

395
00:27:06,805 --> 00:27:13,315
So already resolving conflicts and maybe replaying
your changes on top of other person changes.

396
00:27:13,645 --> 00:27:16,675
So we just take care of conflicts in quite simple.

397
00:27:17,498 --> 00:27:26,152
And we don't solve the problem of merge fully, maybe
postponing it, but we, what we have among goals.

398
00:27:26,152 --> 00:27:29,812
For example, imagine you created a column, which is empty.

399
00:27:30,292 --> 00:27:31,702
Everything is filled.

400
00:27:31,702 --> 00:27:35,152
You have full database, but one column, it's new and it's empty.

401
00:27:35,602 --> 00:27:38,812
And how to test it, You need something there to test it.

402
00:27:39,082 --> 00:27:43,322
So it looks like we need to think about ability to provide some i d.

403
00:27:44,251 --> 00:27:50,768
what I don't like, like fixtures and like seed databases
where we have some fake data, but here we need it.

404
00:27:51,188 --> 00:28:00,184
We need to feel like we need to feel new columns, new tables and
developers should decide how to do it should provide some mean for testing.

405
00:28:00,694 --> 00:28:03,934
So we have everything, but somehow we need to fill new call

406
00:28:04,464 --> 00:28:05,804
Michael: Yeah, like data generation.

407
00:28:05,956 --> 00:28:06,286
Nikolay: Right.

408
00:28:06,286 --> 00:28:09,556
Maybe Snapshotted consider this, like, this is our test data.

409
00:28:09,766 --> 00:28:15,300
We have deviation from our production, from
MA main branch, but we already have test data.

410
00:28:15,330 --> 00:28:15,720
Let's go.

411
00:28:15,720 --> 00:28:16,380
Like, it's good.

412
00:28:16,503 --> 00:28:23,730
Any other engineers can work with it and test it and play
and, and explore how this feature behaves with many roles.

413
00:28:23,735 --> 00:28:24,160
Right?

414
00:28:24,718 --> 00:28:25,018
Michael: Yeah.

415
00:28:25,018 --> 00:28:26,408
Feels like a whole nother topic.

416
00:28:26,998 --> 00:28:27,388
Nikolay: Yeah.

417
00:28:27,448 --> 00:28:29,548
Well, testing is a whole another topic.

418
00:28:29,668 --> 00:28:29,818
Definit.

419
00:28:30,552 --> 00:28:33,102
There are major areas we can discuss there as well.

420
00:28:33,462 --> 00:28:41,652
So what excites me here is I, any, any direction I
go, I feel, How come this is still not developed?

421
00:28:41,652 --> 00:28:45,972
How can we leave if without it, like we, I see how we can leave.

422
00:28:45,977 --> 00:28:57,474
We test on production every time I see, I have some question like, It's about
like this, not very beautiful term, but it's, it's called shift left testing.

423
00:28:57,954 --> 00:29:06,324
When we want developers test first, then some
like testing should be done in the very beginning.

424
00:29:06,474 --> 00:29:12,089
It should be shifted to very left in this infinite
develops develops sign, you know, like this

425
00:29:12,404 --> 00:29:17,594
Michael: Yeah, I know what you mean, but like even if it's
not done in production, it's often done in like a stage, like

426
00:29:17,954 --> 00:29:18,404
Nikolay: Right.

427
00:29:18,684 --> 00:29:22,034
But staging often is very different from production.

428
00:29:22,034 --> 00:29:24,104
So we end up testing it on production,

429
00:29:24,294 --> 00:29:25,034
Michael: I can see what you mean.

430
00:29:25,664 --> 00:29:27,224
Nikolay: really testing, really testing.

431
00:29:27,644 --> 00:29:33,614
We, we pretend sometimes with mark check boxes, it was
tested in lower environments and staging everywhere.

432
00:29:33,614 --> 00:29:37,153
But if you think about, was this testing real?

433
00:29:37,423 --> 00:29:39,673
It was, it was fake testing.

434
00:29:40,014 --> 00:29:44,874
And this is what we want to fix in our development processes.

435
00:29:44,999 --> 00:29:45,489
Michael: Yeah.

436
00:29:45,489 --> 00:29:46,419
Yeah, that'd be great.

437
00:29:47,859 --> 00:29:50,649
Any last thoughts or things you wanted to share with people?

438
00:29:51,924 --> 00:29:54,596
Nikolay: Well just keep an eye on on what we are doing.

439
00:29:54,601 --> 00:29:58,206
Stay tuned and Any feedback, any ideas are welcome.

440
00:29:58,206 --> 00:30:01,026
I'm always ready to discuss this topic with everyone.

441
00:30:01,026 --> 00:30:06,186
I'm like, it's very, very, I think it's one of the
hottest topics in the area of databases right now.

442
00:30:06,216 --> 00:30:14,276
I mean, many problems are being solved, Kubernetes and so on, but
this problem needs to be solved as well, and like, Majority of

443
00:30:14,348 --> 00:30:22,691
development teams will benefit immediately from, from better new
generation tooling to, to build new non-production environments.

444
00:30:22,691 --> 00:30:30,774
So I think we spend too much time thinking on production, but to solve
problems on, on production, we needed to start from non non-production.

445
00:30:30,804 --> 00:30:33,704
And this is, this is interesting.

446
00:30:33,714 --> 00:30:35,754
So ready to talk with everyone.

447
00:30:35,784 --> 00:30:40,044
Just teach me out on Twitter, email anywhere and our regular mantra.

448
00:30:40,819 --> 00:30:47,934
Thank you for, for feedback everyone, for subscriptions, likes topic ideas.

449
00:30:48,114 --> 00:30:55,824
I think we will, again, next time, we should choose
one of the topics proposed by, by our audience.

450
00:30:56,094 --> 00:30:56,424
Very

451
00:30:56,424 --> 00:31:00,024
appreciate ideas, and that's it.

452
00:31:00,174 --> 00:31:03,534
Please, ah, share in your social networks, working groups.

453
00:31:03,534 --> 00:31:06,894
Slack, discord, Master Don, right

454
00:31:07,494 --> 00:31:08,064
everywhere.

455
00:31:08,099 --> 00:31:09,174
Michael: Yeah, of course.

456
00:31:09,186 --> 00:31:10,036
Absolutely.

457
00:31:10,466 --> 00:31:11,516
Well, thank you Nicola.

458
00:31:11,516 --> 00:31:12,296
Thanks everybody.

459
00:31:12,386 --> 00:31:12,806
Take care.

460
00:31:12,986 --> 00:31:13,676
Nikolay: Thank you, Michael.

461
00:31:13,676 --> 00:31:13,976
Bye bye.

462
00:31:14,182 --> 00:31:14,602
Michael: Bye.