1
00:00:04,637 --> 00:00:07,307
What does it actually take to
build a backup system from scratch?

2
00:00:07,517 --> 00:00:11,597
Not just slap together some Arsing
scripts and call it a day, but engineering

3
00:00:11,597 --> 00:00:16,307
something that handles deduplication,
encryption, compression, indexing,

4
00:00:16,307 --> 00:00:19,217
and restoring in one cohesive tool.

5
00:00:19,637 --> 00:00:23,907
Well, today I'm joined by Julian and
Gilles, the CEO and CTO of Plakar,

6
00:00:24,257 --> 00:00:29,669
a new company, but built on nearly
a decade of R&D in building open

7
00:00:29,669 --> 00:00:31,919
source backup systems from scratch.

8
00:00:32,329 --> 00:00:35,809
I am excited about this one because
Plakar goes beyond just backing

9
00:00:35,809 --> 00:00:39,979
up your typical workloads like
databases, file systems, and S3.

10
00:00:40,309 --> 00:00:45,752
It also can back up connectors like Google
Drive, iCloud Drive, OneDrive, and even

11
00:00:45,769 --> 00:00:48,229
things like Notion Dropbox and  imap.

12
00:00:48,859 --> 00:00:49,099
Yeah.

13
00:00:49,099 --> 00:00:49,729
You remember imap?

14
00:00:50,479 --> 00:00:53,779
Well, they've recently joined
the CNCF, so we talk about their

15
00:00:53,779 --> 00:00:57,349
upcoming Kubernetes integration,
obviously a part of this channel.

16
00:00:57,589 --> 00:01:03,049
And on the show I suggested that they
build the ability to back up to Docker

17
00:01:03,049 --> 00:01:06,409
images, basically to OCI registry storage
really is what I was asking about.

18
00:01:06,409 --> 00:01:08,959
And then like within a couple
of weeks they went off and built

19
00:01:08,959 --> 00:01:10,399
that, uh, as a new integration.

20
00:01:11,029 --> 00:01:11,659
That's pretty dope.

21
00:01:12,199 --> 00:01:16,939
This is the one backup tool that
I've seen in really, maybe ever, but

22
00:01:16,969 --> 00:01:21,619
definitely recently that not only looks
useful for work and server workloads and

23
00:01:21,619 --> 00:01:27,499
clusters and clouds, but is something
that I think I want to try to back

24
00:01:27,499 --> 00:01:31,729
up my own personal iCloud and Google
Drive and notion and all those things.

25
00:01:32,209 --> 00:01:36,379
Which I, I've found historically are, is
actually very problematic, challenging to

26
00:01:36,379 --> 00:01:40,279
find a free tool that isn't just cobbled
together with a bunch of different things

27
00:01:40,279 --> 00:01:44,959
and something that's reliably able to be
restored in a reasonable amount of time.

28
00:01:45,289 --> 00:01:50,179
So I'm, I'm excited about this tool 'cause
I feel like it, it affects multiple parts

29
00:01:50,179 --> 00:01:54,559
of my job and life and it's open source.

30
00:01:54,559 --> 00:01:55,699
So let's get into it.

31
00:01:57,767 --> 00:01:58,987
We're going to talk about backups.

32
00:01:59,287 --> 00:02:02,607
We've talked about that before, but
there's so much more to backups.

33
00:02:02,827 --> 00:02:03,557
We're going to get into it.

34
00:02:03,557 --> 00:02:03,967
I'm excited.

35
00:02:03,987 --> 00:02:05,567
you all started a company.

36
00:02:05,577 --> 00:02:06,707
how long ago?

37
00:02:07,197 --> 00:02:11,137
We can give two answers on this one,
the first one is that the company

38
00:02:11,137 --> 00:02:13,747
has been incorporated in 2024.

39
00:02:14,247 --> 00:02:14,547
Yeah.

40
00:02:14,597 --> 00:02:19,457
it's a quite new company to support this
project, but this project is, quite, old

41
00:02:19,457 --> 00:02:23,967
in the sense that Gilles did a lot of
R& D, in the past 10 years on it, so.

42
00:02:24,467 --> 00:02:24,917
Yeah.

43
00:02:25,417 --> 00:02:29,677
What I was noticing as I was digging
through the project was there are a lot of

44
00:02:29,687 --> 00:02:33,727
foundational things when you're creating
a backup product that you have to define

45
00:02:33,727 --> 00:02:35,077
that a lot of us don't think about.

46
00:02:35,447 --> 00:02:39,197
I, the only thing I can think I can
equate it to not being a developer

47
00:02:39,197 --> 00:02:42,597
of a backup product is similar to
creating a new database product because

48
00:02:42,597 --> 00:02:43,857
you had to create a file format.

49
00:02:44,087 --> 00:02:46,147
You had to create a
streaming, backup format.

50
00:02:46,207 --> 00:02:50,407
You had to go, I would imagine, much more
low level than the typical application

51
00:02:50,407 --> 00:02:54,027
developer has to go because you had
all these underlying fundamental

52
00:02:54,037 --> 00:02:57,847
concepts of, you know, things like
the backup file, the caching of the

53
00:02:57,857 --> 00:03:00,387
backups, all that stuff that, yeah,

54
00:03:00,887 --> 00:03:03,637
yesterday I was at a meetup and
I was presenting the product.

55
00:03:03,667 --> 00:03:05,997
Someone, asked me like,
why did you go into backup?

56
00:03:05,997 --> 00:03:07,767
That seems like a very boring, area.

57
00:03:08,177 --> 00:03:11,642
yeah, and it's, if you take it from
a developer perspective, it goes from

58
00:03:11,642 --> 00:03:13,202
very low level to very high level.

59
00:03:13,382 --> 00:03:16,622
It trusts all, like many fields
of, computer science that

60
00:03:16,622 --> 00:03:17,642
you might be interested in.

61
00:03:17,912 --> 00:03:20,252
If you have, like a
high appetite for tech.

62
00:03:20,752 --> 00:03:22,422
So you have to know about
how a fast system works.

63
00:03:22,422 --> 00:03:26,262
You have to know how, how to manage your
memory, how to manage high concurrency,

64
00:03:26,282 --> 00:03:27,902
how to manage, like file formats.

65
00:03:28,512 --> 00:03:32,222
In our case, we kind of
developed a database in a sense

66
00:03:32,282 --> 00:03:33,162
because you have a bit tree.

67
00:03:33,162 --> 00:03:37,162
You have a sense of how to manage,
match your bit tree to something.

68
00:03:37,522 --> 00:03:42,672
So yeah, it's very, complete as a project
to, to dive into technical topics.

69
00:03:43,172 --> 00:03:46,602
I was like, Oh, this is going to be
a small project, small side project.

70
00:03:46,602 --> 00:03:49,612
And then you realize that, Oh, you
end up doing cryptography, you end up

71
00:03:49,612 --> 00:03:51,032
doing compression and stuff like that.

72
00:03:51,052 --> 00:03:54,372
And you, like any area you look
at, you're going to find ways to

73
00:03:54,372 --> 00:03:56,172
improve it and go further into tech.

74
00:03:57,048 --> 00:04:01,068
Yeah, I can only imagine how much
time is spent, on like the engineering

75
00:04:01,068 --> 00:04:05,548
fundamentals of a giant file that
you need to do various things with,

76
00:04:05,558 --> 00:04:09,818
because most of us don't deal with
terabyte sized files on a daily basis.

77
00:04:09,868 --> 00:04:12,148
to me, the biggest files I have
to deal with are model files,

78
00:04:12,158 --> 00:04:14,338
like open source model downloading
and uploading, like that's the

79
00:04:14,338 --> 00:04:15,438
biggest thing I have to deal with.

80
00:04:15,488 --> 00:04:19,058
maybe if I was in an enterprise, I'd
have big backups and stuff like that.

81
00:04:19,108 --> 00:04:23,863
I used to manage backups at a Government
Enterprise, about 7, 000 users.

82
00:04:23,903 --> 00:04:25,423
That was 15 years ago.

83
00:04:25,783 --> 00:04:28,583
And I had two dedicated
staff that worked for me.

84
00:04:28,963 --> 00:04:31,783
All they did was manage
the storage and backups.

85
00:04:31,783 --> 00:04:36,973
Their entire job was ArcServe, I
think we were either using ArcServe

86
00:04:36,973 --> 00:04:40,823
or NetBackup, but we had, you know,
Windows machines, Macs, we had, Linux

87
00:04:40,833 --> 00:04:44,153
machines, we had mainframes, and
it had to handle all of that stuff.

88
00:04:44,153 --> 00:04:47,258
And this was pre cloud, so we didn't
even have to worry about How do I

89
00:04:47,268 --> 00:04:50,608
back up cloud storage or what we
didn't even have S3 at the time.

90
00:04:50,608 --> 00:04:54,828
That wasn't a thing that much in
the early 2000s, but it was so time

91
00:04:54,828 --> 00:05:00,128
consuming and such a nerve wracking
effort to deal with recovery,

92
00:05:00,468 --> 00:05:01,878
which most people don't talk about.

93
00:05:01,918 --> 00:05:03,338
Like we don't spend a lot of time.

94
00:05:03,778 --> 00:05:05,738
when you're talking about
backups, everyone's concerned

95
00:05:05,738 --> 00:05:06,718
about the backup part.

96
00:05:07,158 --> 00:05:09,398
And I always focus more
on the recovery part.

97
00:05:09,428 --> 00:05:11,278
And I get more excited about the recovery.

98
00:05:11,278 --> 00:05:12,478
Like how easy is it?

99
00:05:12,478 --> 00:05:13,368
How fast is it?

100
00:05:13,498 --> 00:05:15,958
how fast can I discover the
thing that I need to recover?

101
00:05:15,958 --> 00:05:17,568
Because often that's the, trick.

102
00:05:17,568 --> 00:05:21,538
If you're backing up hourly and daily
and monthly and weekly, and you've

103
00:05:21,538 --> 00:05:25,978
got all these incrementals all the
traditional backup terminology, like

104
00:05:26,248 --> 00:05:29,308
sometimes you're like, well, we, that
person needs to recover that file.

105
00:05:29,808 --> 00:05:32,068
But it needs to be the one
that's not today, because that

106
00:05:32,068 --> 00:05:33,318
one was corrupted or whatever.

107
00:05:33,358 --> 00:05:38,143
So then you end up Like sleuthing through
a giant caching system trying to find

108
00:05:38,143 --> 00:05:43,693
the one file or the one directory on
one server somewhere amongst a thousand

109
00:05:43,693 --> 00:05:45,713
servers that you had to back up that day.

110
00:05:45,843 --> 00:05:50,958
And how do you do all of that
reliably and in a way that you get,

111
00:05:51,458 --> 00:05:53,128
that two people can handle the data.

112
00:05:53,128 --> 00:05:56,748
And now I don't know any, I don't
know any customers of mine or anyone

113
00:05:56,748 --> 00:05:58,528
who has two people managing backups.

114
00:05:58,528 --> 00:06:00,418
It's like a part time job for one person.

115
00:06:00,838 --> 00:06:02,438
So, Something has changed.

116
00:06:03,138 --> 00:06:07,528
When you tackle that issue of how do you
find the proper thing to restore in a

117
00:06:07,528 --> 00:06:11,198
fast way, you end up realizing that you
have to develop some kind of database.

118
00:06:11,288 --> 00:06:15,848
It's not just a backup, it's not just
like gluing files together into some kind

119
00:06:15,848 --> 00:06:17,798
of archive that's going to be efficient.

120
00:06:18,188 --> 00:06:22,848
You have to actually have indexes, find
things in an efficient way, be able

121
00:06:22,848 --> 00:06:29,758
to generate diffs between versions of
files and do it in a way that can scale.

122
00:06:29,888 --> 00:06:31,718
Because it's not really
just a volume sync.

123
00:06:31,953 --> 00:06:34,633
It's more, how many files am
I going to have to look into?

124
00:06:34,793 --> 00:06:38,673
And, like a large volume of small
files is as problematic as, I

125
00:06:38,673 --> 00:06:40,023
think to back up huge files,

126
00:06:40,523 --> 00:06:44,373
Also the performance, lots of small files
isn't exactly performant on a lot of

127
00:06:44,373 --> 00:06:48,583
systems, but I mean, since I was stopped
managing backups, we now have SSDs.

128
00:06:48,593 --> 00:06:51,573
So like I lived in a world where we had
spinning disks and things were super

129
00:06:51,573 --> 00:06:54,663
slow and, you know, if you had gigabit
networking, you were actually doing

130
00:06:54,663 --> 00:06:56,703
great, and, but times have changed.

131
00:06:56,703 --> 00:07:01,163
So when I look at this, like where's the
elevator pitch that when I looked at the

132
00:07:01,163 --> 00:07:03,223
website, the thing I took away from it.

133
00:07:03,603 --> 00:07:08,913
It was one open source, run on my own
hardware, on prem, wherever I want to run

134
00:07:08,913 --> 00:07:13,713
it, and it has this idea of integrations,
which is not new, like most backups.

135
00:07:13,773 --> 00:07:18,043
You have to have like compatible, like it
has to be compatible with this database

136
00:07:18,043 --> 00:07:21,983
file or this type of storage or this
NAS or this, iSCSI thing or whatever,

137
00:07:22,353 --> 00:07:25,623
but in your case, it looks like the
integrations are more cloud focused.

138
00:07:25,633 --> 00:07:29,863
So they're dealing with HTTP,
but specifically different APIs.

139
00:07:29,893 --> 00:07:32,683
Like I saw Notion in the list,
which I'm a huge Notion fan.

140
00:07:33,183 --> 00:07:35,953
I never thought about backing
up my notion like that.

141
00:07:36,453 --> 00:07:39,553
Then now that I know that it exists,
I'm now like obsessed about, maybe

142
00:07:39,553 --> 00:07:40,823
I should be backing up my notion.

143
00:07:41,233 --> 00:07:42,603
Like, why are the, how did this happen?

144
00:07:42,613 --> 00:07:44,623
how did the integration
list happen the way it does?

145
00:07:45,574 --> 00:07:51,884
maybe to start and let you complete,
Gilles, but, um, you realize that most

146
00:07:51,934 --> 00:07:56,284
of the SaaS provider right now, they
are on a shared model responsibility.

147
00:07:56,784 --> 00:08:00,365
So it means that you are
in charge of the backup.

148
00:08:00,365 --> 00:08:04,364
In the case of Notion, for example, they
are not providing any kind of backup,

149
00:08:04,824 --> 00:08:06,204
and you have to do it by yourself.

150
00:08:06,254 --> 00:08:11,524
And when you look on all the SaaS that
you are using for personal use are.

151
00:08:12,014 --> 00:08:17,004
Even for, you know, enterprise usage, you
see that a lot of, you have a lot of all

152
00:08:17,004 --> 00:08:21,134
in your, resilience, or in the protection
of your data and I think it was important

153
00:08:21,144 --> 00:08:26,594
to have, software that is able to, manage,
I would say, the legacy tasks like, you

154
00:08:26,594 --> 00:08:32,114
know, backuping files, et cetera, but
also be able to back up any kind of data.

155
00:08:32,559 --> 00:08:34,849
including, of course,
everything coming from SAS.

156
00:08:34,849 --> 00:08:39,649
So I think at some point in the product,
we decided, okay, it's not, a backup

157
00:08:39,649 --> 00:08:43,899
solution that is supposed to backup only
files, but a backup solution that is, so

158
00:08:43,899 --> 00:08:48,609
it should be able to backup any kind of
data and maybe you can tell a bit more

159
00:08:48,609 --> 00:08:50,639
about, you know, how did you do that?

160
00:08:51,554 --> 00:08:55,214
the, just to mention about the open
source part, the main driver initially

161
00:08:55,214 --> 00:08:57,259
was to avoid having, vendor locking.

162
00:08:57,479 --> 00:08:59,979
Because, you don't have that many
solutions that can back up many

163
00:08:59,979 --> 00:09:01,749
sources and that are not closed today.

164
00:09:01,859 --> 00:09:06,759
you have hacks, you have scripts that
bundle a bunch of solutions, but you don't

165
00:09:06,759 --> 00:09:08,259
have one solution that you can trust.

166
00:09:08,649 --> 00:09:11,729
And, and it so happened that a
friend of mine who has like a

167
00:09:11,729 --> 00:09:15,179
degree in computer science, so he's
like fairly educated on the topic.

168
00:09:15,229 --> 00:09:18,989
He managed to lose all his data
because he used a set of scripts

169
00:09:19,029 --> 00:09:20,489
that did not behave correctly.

170
00:09:20,924 --> 00:09:23,914
And, he did not realize because,
everything seemed to be okay

171
00:09:23,914 --> 00:09:25,434
until the day his server crashed.

172
00:09:25,624 --> 00:09:28,874
And he had to rely on the restore
part that everyone overlooks now.

173
00:09:29,414 --> 00:09:33,364
and the thing is, if he had a solution
that was not a glue of multiple script

174
00:09:33,364 --> 00:09:36,324
and rsync and blah, blah, blah, they
would, this would not have happened.

175
00:09:36,754 --> 00:09:39,884
and now you end up having to
look at what solution allow you

176
00:09:39,904 --> 00:09:41,634
to, a backup multiple source.

177
00:09:41,634 --> 00:09:45,749
And you end up Having to go generally
towards commercial solutions,

178
00:09:46,089 --> 00:09:48,869
that will provide support for
multiple sources without hacks.

179
00:09:49,309 --> 00:09:51,939
And they will usually have,
some kind of closed format.

180
00:09:52,439 --> 00:09:55,659
So you have to trust that they
will not go away or they will not

181
00:09:55,899 --> 00:09:58,589
bump their prices and that you
can trust them on the long run.

182
00:09:59,339 --> 00:10:03,459
And what I wanted was to initially
have a well documented format that

183
00:10:03,699 --> 00:10:07,369
We are going to be fully open with a
license that prevents closing the code.

184
00:10:07,369 --> 00:10:10,309
if we decided to go wrong,
someone would just fork the

185
00:10:10,309 --> 00:10:11,989
code and it will go that way.

186
00:10:12,389 --> 00:10:16,239
so that, that's a safeguard
against ourselves going wrong.

187
00:10:16,249 --> 00:10:16,459
Yeah.

188
00:10:16,959 --> 00:10:18,769
And then you have, what
do you do with that?

189
00:10:19,269 --> 00:10:20,969
how do you manage multiple sources?

190
00:10:20,999 --> 00:10:23,909
And you realize that most of the
open source solution, either there

191
00:10:23,909 --> 00:10:27,819
are, there are three, Fairly targeted
at, at doing synchronization like

192
00:10:27,829 --> 00:10:32,219
rsync and they are twisted into doing
backups through hard links, like

193
00:10:32,279 --> 00:10:32,629
tricks,

194
00:10:33,129 --> 00:10:38,679
or they have a high, file system,
like they're highly built around

195
00:10:38,679 --> 00:10:39,849
the concept of file system.

196
00:10:40,349 --> 00:10:44,129
So you can actually do a backup
of an S3 bucket, for example, but

197
00:10:44,129 --> 00:10:47,149
that's using a trick to map the
S3 bucket on your file system.

198
00:10:47,529 --> 00:10:54,399
So they have, limitations And, they
do not work well when you break this

199
00:10:54,399 --> 00:10:58,749
limitation, if you create a bucket and
put 2 million objects in it and try

200
00:10:58,749 --> 00:11:01,669
to mount it to the file system, that's
not going to work very well for you.

201
00:11:02,409 --> 00:11:06,449
There was a, like a disruption
in how do you model this?

202
00:11:06,479 --> 00:11:09,869
How do you model this issue that you
want to import various sources, you don't

203
00:11:09,869 --> 00:11:13,489
know these sources yet, and you want to
be extensible and have a plugin system,

204
00:11:13,849 --> 00:11:18,269
so you don't even know what plugins will
be written in a year from now, and make

205
00:11:18,269 --> 00:11:23,719
it fit in a model that will scale if you
have flat, flattened data at the root.

206
00:11:24,259 --> 00:11:28,549
Designing this model, we came up with
something abstract enough that you can

207
00:11:28,949 --> 00:11:31,249
kind of prove that anything can go in.

208
00:11:31,749 --> 00:11:35,579
And ourself, we work with that abstraction
so that, like they all benefit from

209
00:11:35,579 --> 00:11:38,629
the same, same deduplication, same
encryption, same, like same features

210
00:11:38,689 --> 00:11:41,309
without, Like when you write a
plugin yourself, you would not have

211
00:11:41,309 --> 00:11:42,489
to think about all the details.

212
00:11:42,799 --> 00:11:44,459
You would just have to think
about how do I get the data

213
00:11:44,639 --> 00:11:46,199
from this point to this point.

214
00:11:46,599 --> 00:11:50,219
It will do the work once it's
there and in a very simple API.

215
00:11:50,719 --> 00:11:52,719
So our most of the work was done on that.

216
00:11:53,299 --> 00:11:56,849
Finding that abstraction that allows
us to work efficiently, but assuming

217
00:11:56,899 --> 00:11:58,459
a wide variety of, of sources.

218
00:11:58,959 --> 00:12:02,289
And most of the integration that
we have are, some are tagged stable

219
00:12:02,289 --> 00:12:06,229
and some are tagged beta because we
are a bit hard on ourselves because

220
00:12:06,539 --> 00:12:07,839
beta does not mean it does not work.

221
00:12:08,279 --> 00:12:10,009
It means that, we want to show it works.

222
00:12:10,509 --> 00:12:13,519
And depending on the, how people
are interested in that backend,

223
00:12:13,519 --> 00:12:16,729
we might drive that one, further
in terms of, projection readiness.

224
00:12:17,309 --> 00:12:18,939
but they all work, to some extent.

225
00:12:19,439 --> 00:12:23,229
I can imagine like little edge cases
of a lot of this stuff, especially

226
00:12:23,229 --> 00:12:26,899
when you're pulling and pushing
from an API that isn't exactly.

227
00:12:31,179 --> 00:12:32,379
I have interesting questions.

228
00:12:32,379 --> 00:12:35,439
It's like, okay, with the notion,
how exactly does that recovery work?

229
00:12:35,439 --> 00:12:38,609
And what if there's duplicate data,
do you, where do you know, we all can

230
00:12:38,609 --> 00:12:42,919
conceptually, we've all, most of us have
all dealt with file based backups, right?

231
00:12:42,919 --> 00:12:45,359
Same single system, same host.

232
00:12:45,839 --> 00:12:47,809
Easy, easy day, right?

233
00:12:47,809 --> 00:12:49,189
you're not even dealing
with remote storage.

234
00:12:49,469 --> 00:12:54,269
And then, like, people tend to evolve into
a, okay, now I'm doing, like, SMB mounts

235
00:12:54,269 --> 00:12:59,219
or something to put some files elsewhere,
I'm doing low tech rsync or something.

236
00:12:59,649 --> 00:13:04,319
and then there's, this giant chasm,
I feel like, which is, there's all

237
00:13:04,319 --> 00:13:08,359
those little utilities that are very
niche and very composable, but you're,

238
00:13:08,409 --> 00:13:10,469
like, you're saying, you're building
your own scripts, you're building your

239
00:13:10,469 --> 00:13:14,359
own, orchestration, essentially you're
designing the orchestration yourself.

240
00:13:14,739 --> 00:13:20,459
And then from there to a complete
cohesive strategy that uses one

241
00:13:20,459 --> 00:13:25,349
or two products maximum, you
suddenly jump into like enterprise.

242
00:13:25,849 --> 00:13:28,489
There's a lot of enterprise backup
garbage out there, I feel like,

243
00:13:28,489 --> 00:13:29,879
like there's a ton of stuff that.

244
00:13:30,169 --> 00:13:33,149
Especially when it comes to
cloud APIs, I do this every year.

245
00:13:33,229 --> 00:13:37,369
Every year I have, I'm a small business
of, you know, three to five people,

246
00:13:37,369 --> 00:13:38,779
depending on what year we're talking.

247
00:13:39,139 --> 00:13:41,159
And I, so I have some business needs.

248
00:13:42,094 --> 00:13:45,744
But mostly they're the sim the backup
needs I have are like what a person

249
00:13:45,744 --> 00:13:47,399
would ha an individual would have.

250
00:13:47,399 --> 00:13:50,784
I have iCloud, I have, you know,
Google Drive, I have Dropbox, I

251
00:13:50,784 --> 00:13:53,054
probably have STP FTP somewhere.

252
00:13:53,274 --> 00:13:57,354
I have Notion, I might have some
S3 buckets, and I have Macs, and I

253
00:13:57,384 --> 00:13:58,859
need to, manage all these things.

254
00:13:58,859 --> 00:14:02,614
I have a an Ubuntu server in the closet,
There are things and places, I would

255
00:14:02,614 --> 00:14:06,404
honestly love my GitHub Git repos to
be backed up automatically just in

256
00:14:06,404 --> 00:14:10,584
case GitHub goes down and I need to
move to, you know, GitLab or something.

257
00:14:10,934 --> 00:14:17,374
And when I look at, just for Google Drive
or iCloud or OneDrive, any of the sort of

258
00:14:17,854 --> 00:14:22,544
top three cloud file drives or whatever
you want to call them, They're really,

259
00:14:22,544 --> 00:14:27,804
I couldn't find a single product on the
internet that I could buy for one person.

260
00:14:28,114 --> 00:14:31,214
It seemed like all the products out there
that were like, yeah, we'll back up your

261
00:14:31,224 --> 00:14:34,064
company's Google Drive, because that's,
you know, I have the company version of

262
00:14:34,064 --> 00:14:38,529
Google Drive and the company version of
OneDrive and Those don't always work with

263
00:14:38,529 --> 00:14:43,309
all the consumer stuff, or if you're using
CyberDuck or some other little utility.

264
00:14:43,679 --> 00:14:47,939
I was looking into backing up three
people's Google Drive, and I was

265
00:14:47,939 --> 00:14:51,639
looking at possibly having to spend
500 a month on an enterprise piece

266
00:14:51,639 --> 00:14:56,189
of software because their minimum
License purchase was like five users

267
00:14:56,189 --> 00:14:57,859
or 10 users or something like that.

268
00:14:58,329 --> 00:14:59,279
And I gave up.

269
00:14:59,289 --> 00:15:00,659
I eventually just gave up.

270
00:15:00,919 --> 00:15:04,119
I couldn't figure out a
scenario that didn't require.

271
00:15:04,619 --> 00:15:09,809
A bunch of weird scripts with, you
know, running cron jobs that would

272
00:15:09,809 --> 00:15:13,269
probably never notify me in a failure
that would run certain things.

273
00:15:13,269 --> 00:15:15,369
And it just was a mess.

274
00:15:15,689 --> 00:15:19,949
So you guys show up and suddenly
I'm like this, I could do this in an

275
00:15:19,949 --> 00:15:21,389
afternoon and it would cost me nothing.

276
00:15:21,559 --> 00:15:24,169
Like it would cost me
pennies with, Plakar.

277
00:15:24,169 --> 00:15:24,189
Yeah.

278
00:15:24,319 --> 00:15:24,329
and

279
00:15:24,379 --> 00:15:28,609
the nice thing also about the, like
our open source dimension, 'cause we

280
00:15:28,609 --> 00:15:30,269
are an open source first, company.

281
00:15:30,269 --> 00:15:32,249
Clearly whatever we do is open source.

282
00:15:32,554 --> 00:15:35,504
Unless it's strategically
not good to do it on purpose,

283
00:15:35,704 --> 00:15:37,024
but that's the default thing.

284
00:15:37,274 --> 00:15:41,134
it's to provide enough libraries
and examples to empower users to

285
00:15:41,134 --> 00:15:43,654
actually extend the integration.

286
00:15:44,064 --> 00:15:47,224
Our goal right now would not
be to be like, we handle all

287
00:15:47,224 --> 00:15:48,314
the integrations ourselves.

288
00:15:48,704 --> 00:15:52,394
That would be more like, an integration
that's fairly critical to companies.

289
00:15:52,394 --> 00:15:52,934
We would do it.

290
00:15:53,319 --> 00:15:57,719
to provide some kind of level of
quality, then if users want to implement

291
00:15:57,729 --> 00:16:01,719
specific integration, we would like
help them, get them forward because,

292
00:16:01,759 --> 00:16:04,689
like if you want to use one tool to
back up everything, you have to have the

293
00:16:04,689 --> 00:16:06,759
manpower to do everything, which is not

294
00:16:06,779 --> 00:16:07,379
going to happen.

295
00:16:08,139 --> 00:16:11,759
and by making like some of the tasks
we did today with my team was how

296
00:16:11,759 --> 00:16:13,759
do we simplify the API even further?

297
00:16:13,759 --> 00:16:14,519
So, so.

298
00:16:14,844 --> 00:16:17,794
People are less likely to even shoot
themselves in the foot while trying

299
00:16:17,794 --> 00:16:21,614
to do something simple because that
lowers the bar to being able to

300
00:16:21,664 --> 00:16:25,074
actually, instead of spending your
time writing a script, that's not going

301
00:16:25,074 --> 00:16:29,124
to be very good, write an integration
because it's as simple as that script.

302
00:16:29,174 --> 00:16:33,324
And it's going to be like a reviewed and
you're going to get help from others.

303
00:16:33,534 --> 00:16:37,434
It's going to fit into one thing that
actually tackles the difficult part.

304
00:16:37,934 --> 00:16:42,554
And that's where I would like to
reach in terms of open source.

305
00:16:43,104 --> 00:16:44,114
How does this work?

306
00:16:44,214 --> 00:16:48,159
in terms of the development are I
mean, all the integrations are open

307
00:16:48,159 --> 00:16:51,519
source, but what, how many of those
integrations are created by the

308
00:16:51,539 --> 00:16:56,579
community versus the core team is it,
I'm assuming this is led by feedback.

309
00:16:56,579 --> 00:16:57,929
Like people are asking for things.

310
00:16:57,939 --> 00:17:00,579
So then you're motivated to
make a, integration for them.

311
00:17:01,079 --> 00:17:03,389
in terms of, how many were
done by the community,

312
00:17:03,809 --> 00:17:05,209
I'm just curious, like the ratio.

313
00:17:05,259 --> 00:17:09,609
And now currently it's currently all of
the integration were done by ourselves.

314
00:17:09,689 --> 00:17:09,859
And

315
00:17:09,859 --> 00:17:13,519
we have pushed, a few months ago,
we have pushed the SDK and we are

316
00:17:13,519 --> 00:17:18,059
trying to provide example and,
you know, Simplify even further.

317
00:17:18,469 --> 00:17:21,749
but it's the community that's
driving the decision about which

318
00:17:21,749 --> 00:17:23,539
one we do currently, for example.

319
00:17:24,039 --> 00:17:27,779
Like people have been asking for
IMAP and gcloud and stuff like that.

320
00:17:27,939 --> 00:17:29,979
We're going to go spend
more time doing that.

321
00:17:30,449 --> 00:17:34,859
but yeah, the idea is to start growing
the developer community, not the user

322
00:17:35,059 --> 00:17:38,579
community, but the developer community
into extending their own integration.

323
00:17:39,079 --> 00:17:39,519
Yeah.

324
00:17:39,614 --> 00:17:43,654
we reached a right level right now, a
right level of, easiness, difficulty,

325
00:17:43,844 --> 00:17:48,274
depending on how you see it, of writing
on integration because it's, it boils down

326
00:17:48,284 --> 00:17:54,064
now to writing one function that scans
and allows you to enumerate your data.

327
00:17:54,334 --> 00:17:57,534
And provides you an accessor to
the data to actually read it.

328
00:17:57,614 --> 00:18:01,104
Once you have that, it can plug into
what we have and you get all the,

329
00:18:01,104 --> 00:18:02,164
benefits behind.

330
00:18:02,804 --> 00:18:07,304
Which means that some of the integration,
like the Google Cloud integration

331
00:18:07,304 --> 00:18:10,054
was done in half an hour, unplanned.

332
00:18:10,554 --> 00:18:13,774
So one of the developers was like,
Oh, well, I have a half an hour.

333
00:18:13,774 --> 00:18:14,284
I'll do that.

334
00:18:14,564 --> 00:18:15,614
And that's Okay.

335
00:18:15,614 --> 00:18:17,854
He has the knowledge, but you can
assume that someone who does not have

336
00:18:17,854 --> 00:18:20,964
the knowledge will take more time,
but he's not going to go from 30

337
00:18:20,964 --> 00:18:23,944
minutes to a month doing that task.

338
00:18:24,324 --> 00:18:26,714
You're not reinventing the wheel
every time you want to back up

339
00:18:26,714 --> 00:18:30,054
a different product and, cause
I'm here for the Docker backups.

340
00:18:30,054 --> 00:18:36,104
I'm here for, I'm here for, image
registries to be a, from and to.

341
00:18:36,574 --> 00:18:39,094
And for me, so I actually.

342
00:18:39,399 --> 00:18:42,549
years ago, I created a small
script called Docker Backup,

343
00:18:43,049 --> 00:18:45,329
Volume Backup is like the name.

344
00:18:45,369 --> 00:18:49,699
And it kind of took off a little bit
and then Docker ended up adding it

345
00:18:49,699 --> 00:18:51,179
as an extension into Docker Desktop.

346
00:18:51,179 --> 00:18:54,079
And then eventually they just made it
a default feature in Docker Desktop.

347
00:18:54,419 --> 00:18:57,869
And so now like in the Docker
community, volumes were never really

348
00:18:57,869 --> 00:19:01,819
meant to be moved around as images,
but they're just files, right?

349
00:19:01,819 --> 00:19:06,759
So, I get more requests for fixing
that shell script, if that's all it

350
00:19:06,759 --> 00:19:10,759
is, and working on that than just about
every one of my other examples, and

351
00:19:10,839 --> 00:19:17,499
there's clearly a need for developers
to have, to move or backup volumes on

352
00:19:17,499 --> 00:19:21,839
their local Docker system, or whether
it's Docker or Containerd or Cryo or,

353
00:19:22,159 --> 00:19:23,329
whatever, it doesn't really matter.

354
00:19:23,589 --> 00:19:26,029
Podman, the developer sometimes wants.

355
00:19:26,529 --> 00:19:30,269
To move, you know, the database files that
are on that Docker volume somewhere else.

356
00:19:30,289 --> 00:19:31,829
And there's not really a move option.

357
00:19:31,909 --> 00:19:32,259
Right.

358
00:19:32,299 --> 00:19:36,009
And there's no easy way you kind of
have to learn all these different

359
00:19:36,029 --> 00:19:37,679
commands for extracting it out.

360
00:19:37,689 --> 00:19:38,789
Do you put it in a tarball?

361
00:19:38,799 --> 00:19:40,039
Do you put it in a container image?

362
00:19:40,039 --> 00:19:41,249
Like all that stuff.

363
00:19:41,289 --> 00:19:42,609
So I'm here for that integration.

364
00:19:42,609 --> 00:19:43,339
So sign me up.

365
00:19:44,304 --> 00:19:45,404
I'm going to make you laugh.

366
00:19:45,454 --> 00:19:48,164
two days ago, I was
having my sleepless night

367
00:19:48,994 --> 00:19:50,324
of wondering what I was going to do.

368
00:19:50,584 --> 00:19:55,214
I was looking into Docker because, we had
a discussion a long time ago about, how

369
00:19:55,234 --> 00:20:00,014
could we benefit from our deduplication
to, lower the size of storage for

370
00:20:00,014 --> 00:20:04,179
images instead of, layering layers, each
of the layers could be deduplicated.

371
00:20:04,679 --> 00:20:08,409
And so I looked into it because
I had never looked into how the

372
00:20:08,419 --> 00:20:09,819
backup of this stuff worked.

373
00:20:10,319 --> 00:20:11,859
And they use tar as a format,

374
00:20:12,119 --> 00:20:12,539
yep.

375
00:20:13,959 --> 00:20:15,549
And we have a tar importer.

376
00:20:15,944 --> 00:20:19,304
Which actually can extract a tar and back
up what's inside the tar, which means

377
00:20:19,314 --> 00:20:23,214
that you could back up all your images
and have the duplications through them.

378
00:20:23,524 --> 00:20:26,714
And I look into how it's happening
with containers and we can back up

379
00:20:26,714 --> 00:20:27,914
containers the same way, actually.

380
00:20:28,444 --> 00:20:30,724
and metadata, That's all it is.

381
00:20:30,784 --> 00:20:33,664
we have an integration that's not,
not, you're ready yet because, it's

382
00:20:33,664 --> 00:20:37,094
a small experiment, but something we
could push forward, which is, okay,

383
00:20:37,234 --> 00:20:39,084
we have already a tar integration, So

384
00:20:39,084 --> 00:20:42,674
we can create a Docker integration, which
is actually, an integration that talks to

385
00:20:42,674 --> 00:20:47,094
the Docker API to get a stream of the tar
that gets passed into the tar integration.

386
00:20:47,484 --> 00:20:50,624
And then boom, you have a new thing
that's packaged and not using a

387
00:20:50,624 --> 00:20:52,554
script on the side, which is the goal.

388
00:20:52,554 --> 00:20:56,674
So really that's the point of
the, of, of, Plakar initially is

389
00:20:56,674 --> 00:21:00,194
to allow doing this in ways that,
oh, that stuff is not backed up.

390
00:21:00,564 --> 00:21:04,904
How can I actually back it up, you know,
in a clear way without too much effort?

391
00:21:05,264 --> 00:21:07,994
Obviously, there's some dev
here, but once it's done, it's

392
00:21:08,004 --> 00:21:09,484
no longer dev for other people.

393
00:21:09,524 --> 00:21:14,364
So, so there's that, but the idea is
that then you, if you trust the tool,

394
00:21:14,504 --> 00:21:17,724
you trust that your Docker backup
is working the same way as your SQL

395
00:21:17,934 --> 00:21:19,554
backup or your file system backup.

396
00:21:19,594 --> 00:21:23,974
yeah, there's even a scenario
where, You could use, cause you

397
00:21:23,974 --> 00:21:26,704
know, a container registry is
nothing but an object store really.

398
00:21:27,054 --> 00:21:31,734
And there's a, there are in the cloud
native community, there's sort of a

399
00:21:31,734 --> 00:21:35,214
consensus around the container registry
being the artifact of all things,

400
00:21:35,224 --> 00:21:36,674
like the storage of all artifacts.

401
00:21:37,074 --> 00:21:39,784
And so now we have all these different
types, container images, it's just

402
00:21:39,794 --> 00:21:43,794
one type for an OCI registry, but
there's all these other now file types.

403
00:21:43,794 --> 00:21:46,114
We can store Helm chart data in there.

404
00:21:46,114 --> 00:21:47,704
We can still compose files in there.

405
00:21:48,044 --> 00:21:49,234
Each one is its own.

406
00:21:49,734 --> 00:21:53,704
It's not a container image, it's just
a registry artifact, and there's even

407
00:21:53,704 --> 00:21:59,094
utilities now that we use to, if we have
a new type of object that we want to store

408
00:21:59,094 --> 00:22:03,024
in the registry, we can use tool utilities
to create the metadata for all that.

409
00:22:03,424 --> 00:22:07,014
I'm not sure that necessarily a registry
is a great backup storage location.

410
00:22:07,314 --> 00:22:10,234
Like maybe an S3 storage
system would be better.

411
00:22:10,234 --> 00:22:12,294
cause all the clouds already
have that, but they all also

412
00:22:12,294 --> 00:22:14,164
already have image registries.

413
00:22:14,434 --> 00:22:19,274
And so a lot of times when I'm working
with teams, like if we're going to

414
00:22:19,274 --> 00:22:22,394
implement some sort of new backup or
some sort of new replication system,

415
00:22:22,394 --> 00:22:26,434
or like we, if we need storage for
something, it's a lot easier for

416
00:22:26,444 --> 00:22:27,974
me to use what they already have.

417
00:22:28,189 --> 00:22:32,189
What my ultimate back end storage, I
mean, file storage and S3 storage probably

418
00:22:32,329 --> 00:22:35,859
makes the most sense for the two types
of storage for backups, but there's

419
00:22:35,869 --> 00:22:40,259
probably other scenarios like container
registries, and I love the idea that when

420
00:22:40,259 --> 00:22:44,789
I'm on the site, the integrations can let
me, it kind of clues me into which ones

421
00:22:44,789 --> 00:22:48,339
are inputs and outputs and which ones are
both, and it made, and just staring at

422
00:22:48,349 --> 00:22:54,169
the options that you had made me think
like, Well, you know, do, can I store, you

423
00:22:54,169 --> 00:22:57,399
know, Google Drive backups in OneDrive?

424
00:22:57,479 --> 00:23:01,249
And then, also store
OneDrive backups in Notion.

425
00:23:01,259 --> 00:23:04,699
Like, I started to wonder, what's
my, what's, what is my, I have a

426
00:23:04,699 --> 00:23:07,689
document that's the path of where
all the things I need and where

427
00:23:07,689 --> 00:23:08,999
they all go for backups, right?

428
00:23:08,999 --> 00:23:12,659
If we're all talking about 3 2 1
storage for backups, like we've, we

429
00:23:12,659 --> 00:23:16,444
often as backup engineers, even of
your own software, your own, stuff at

430
00:23:16,444 --> 00:23:21,454
home, you often forget a year later
what you did with all it and how often

431
00:23:21,454 --> 00:23:22,924
it backups and where is it going.

432
00:23:23,194 --> 00:23:24,114
I tend to forget.

433
00:23:24,124 --> 00:23:26,894
And I know I'm using Backblaze
in some places and I'm using a

434
00:23:26,894 --> 00:23:28,284
different cloud in other places.

435
00:23:28,684 --> 00:23:32,464
And I have to document all of that
for my own sanity because every year

436
00:23:32,464 --> 00:23:35,174
I think I should check my backups
and see if they're still working.

437
00:23:35,494 --> 00:23:38,824
And then I forget, I don't know
where my backups are, how they work.

438
00:23:39,604 --> 00:23:41,294
And I have to go and
redo all that research.

439
00:23:41,294 --> 00:23:45,014
So the idea that I could maybe Get
closer to having this in one product

440
00:23:45,024 --> 00:23:47,934
is something that, I might have to, I
might have to do and make a video on.

441
00:23:47,934 --> 00:23:49,604
So, let's get some Docker stuff in there.

442
00:23:49,839 --> 00:23:50,089
a break

443
00:23:50,104 --> 00:23:54,204
think the only managed service that we
are providing to the community right now

444
00:23:54,204 --> 00:23:58,394
is to, you know, send some email if you
have issue with your backup and a summary

445
00:23:58,394 --> 00:24:01,414
of, you know, what you are backing up.

446
00:24:01,414 --> 00:24:05,034
maybe we can cover this one because
it will be always in your mailbox,

447
00:24:05,034 --> 00:24:08,444
you know, what kind of backup
you have and where it's stored.

448
00:24:08,444 --> 00:24:09,874
So that could be nice.

449
00:24:10,374 --> 00:24:10,674
Yeah.

450
00:24:10,674 --> 00:24:16,244
And at this point, you know, once,
once we have the image registry stuff,

451
00:24:16,584 --> 00:24:20,244
then you start talking about Kubernetes
and can we run this on Kubernetes?

452
00:24:20,709 --> 00:24:23,519
let's talk about the storage for a
little bit and get into the weeds of it

453
00:24:23,519 --> 00:24:26,749
because I'm going to, I'm going to pick,
bring this up and hopefully this isn't,

454
00:24:27,099 --> 00:24:28,839
hopefully I'm not, trolling your issues.

455
00:24:29,339 --> 00:24:33,669
But I brought, I was looking to deploy
this before the show so that I could

456
00:24:33,669 --> 00:24:38,389
come to the show and have feedback
or experience to talk about and say,

457
00:24:38,389 --> 00:24:39,909
yeah, I got it to work last night.

458
00:24:40,409 --> 00:24:44,599
the first thing I go for being a Docker
guy is I want to deploy the Docker image.

459
00:24:45,099 --> 00:24:49,449
There's an issue open that's asking for
the Docker image and, I think someone

460
00:24:49,449 --> 00:24:51,589
replied and said, Oh, we're not ready yet.

461
00:24:51,829 --> 00:24:54,839
We've got caching, we've got other things
we've got to worry about and we need to

462
00:24:54,839 --> 00:24:56,259
come up with a more cohesive strategy.

463
00:24:56,599 --> 00:24:58,409
So I'm here for that cohesive strategy.

464
00:24:58,409 --> 00:24:59,319
Let's talk about that.

465
00:24:59,379 --> 00:25:01,789
what are the challenges
that you're seeing?

466
00:25:01,799 --> 00:25:02,989
And do you have a plan?

467
00:25:02,999 --> 00:25:05,549
Because we mentioned, we talked a
little bit before the show, but it

468
00:25:05,549 --> 00:25:06,899
sounds like there's stuff coming.

469
00:25:06,899 --> 00:25:09,489
So, not to spoil anything,
but let's talk about it.

470
00:25:10,639 --> 00:25:13,149
It's just that the feature
requests came very early.

471
00:25:13,289 --> 00:25:17,449
we had, like our first, server release
that happened a few months before this.

472
00:25:17,829 --> 00:25:20,819
we had, the first user feedback
and we were trying to find the

473
00:25:20,829 --> 00:25:22,259
priority, things to tackle.

474
00:25:22,809 --> 00:25:27,319
and this came and requires some,
Thinking from the team about what it

475
00:25:27,319 --> 00:25:31,399
means to have a Docker image for this,
because you have to actually mount

476
00:25:31,399 --> 00:25:33,009
your volumes, within the Docker image.

477
00:25:33,509 --> 00:25:38,429
You will not run this as an agent in
most cases, or is it what users want?

478
00:25:38,469 --> 00:25:39,379
That's an open question.

479
00:25:39,379 --> 00:25:40,229
That's not an answer.

480
00:25:40,299 --> 00:25:42,929
That's, you have to think about
how are they going to use that?

481
00:25:43,069 --> 00:25:45,819
Because the Docker image you're going
to ship as an official one, you're

482
00:25:45,819 --> 00:25:47,289
going to have to support it in some way.

483
00:25:47,289 --> 00:25:47,854
You can't just.

484
00:25:47,964 --> 00:25:51,954
say, okay, we just raised the Docker
image and it's not doing anything useful.

485
00:25:52,254 --> 00:25:55,594
And we had users saying, oh,
well, we need to have a, this

486
00:25:55,594 --> 00:25:57,094
is going to be run from a CI.

487
00:25:57,594 --> 00:25:58,074
So it.

488
00:25:58,074 --> 00:25:59,404
loses its state every time.

489
00:25:59,554 --> 00:26:00,744
So you need to rebuild state.

490
00:26:00,744 --> 00:26:01,064
Okay.

491
00:26:01,064 --> 00:26:04,464
Well, that's going to be an issue
because we need to have some persistence

492
00:26:04,474 --> 00:26:05,684
out of the Docker image for this.

493
00:26:06,134 --> 00:26:08,534
Some people saying, oh, I'm
going to use this from a machine.

494
00:26:08,534 --> 00:26:08,794
Yeah.

495
00:26:08,794 --> 00:26:11,664
but that means you could have installed
Plakar on your machine rather than

496
00:26:11,674 --> 00:26:15,294
Docker, because it's going to be a,
like a lot of work for us to support

497
00:26:15,494 --> 00:26:17,254
a use case that's not that useful.

498
00:26:17,454 --> 00:26:21,554
Whereas you could launch Docker as an
agent with Plakar in it to go query the

499
00:26:21,554 --> 00:26:26,434
other things because Plakar is flexible
enough that you can run it as a, I'm

500
00:26:26,434 --> 00:26:27,944
doing my job from the Plakar instance.

501
00:26:28,389 --> 00:26:30,859
on my machine, but also
as this is controlling the

502
00:26:30,869 --> 00:26:33,329
backups from other machines and
transferring data here and there.

503
00:26:33,829 --> 00:26:35,939
I mean, depending on what
direction you take, you would not

504
00:26:35,979 --> 00:26:37,069
build that image the same way.

505
00:26:37,069 --> 00:26:40,279
And that wouldn't, I don't know which
one you would advertise the most.

506
00:26:40,289 --> 00:26:40,519
Yeah.

507
00:26:41,099 --> 00:26:42,639
Without having user feedback on this.

508
00:26:43,299 --> 00:26:46,179
It needs to have discussion and we need
to have users from the community telling

509
00:26:46,179 --> 00:26:47,919
us that's what we need in the ary image.

510
00:26:47,919 --> 00:26:49,849
And that's what, Is going
to drive the development.

511
00:26:51,019 --> 00:26:51,979
Um, that's

512
00:26:51,979 --> 00:26:52,549
more, most of

513
00:26:52,549 --> 00:26:53,149
the issues there,

514
00:26:53,649 --> 00:27:00,949
is there a concept of backing up the
system itself, like the configuration

515
00:27:00,969 --> 00:27:05,309
and the plugin list, is it got an
internal backup command that allows

516
00:27:05,309 --> 00:27:09,559
me to save, essentially save state
of the whole system outside of the

517
00:27:09,559 --> 00:27:10,929
individual integration backups?

518
00:27:11,429 --> 00:27:12,459
Well, there's two, things.

519
00:27:12,534 --> 00:27:15,864
The first thing is that none of
the state is, mandatory When you

520
00:27:15,864 --> 00:27:19,744
run instance, you wipe your, cache,
folder is going to rebuild the, state.

521
00:27:19,804 --> 00:27:20,014
Yeah.

522
00:27:20,224 --> 00:27:22,234
So you are going to lose the
plugins that were installed.

523
00:27:22,789 --> 00:27:24,279
But you can just click and reinstall.

524
00:27:24,579 --> 00:27:27,729
so that would be the idea is if you did
not have backup for this, you are not

525
00:27:27,739 --> 00:27:31,729
in the last case because you just, you
could go from a blank machine, you point

526
00:27:31,729 --> 00:27:34,739
to the repository, it will synchronize
again, and you will get a state that's

527
00:27:34,779 --> 00:27:35,879
a working state for your backup.

528
00:27:36,429 --> 00:27:39,279
If you need to have backup because
you want to avoid having to re

529
00:27:39,459 --> 00:27:41,989
synchronize or you want to avoid
having to reinstall, well, you can

530
00:27:41,989 --> 00:27:45,342
just backup the cache directory and
you get a snap, a Plakar snapshot

531
00:27:45,342 --> 00:27:47,009
with the configuration of your Plakar.

532
00:27:47,459 --> 00:27:50,969
there's no particular, like
things that you would have

533
00:27:50,969 --> 00:27:52,089
to do to make this possible.

534
00:27:52,179 --> 00:27:55,519
It's just, there's just a standard
way of using Plakar basically.

535
00:27:55,989 --> 00:27:56,459
Yeah.

536
00:27:56,959 --> 00:27:59,249
on the infrastructure side of the storage.

537
00:27:59,669 --> 00:28:01,219
you mentioned encryption, so
it sounds like you, do you

538
00:28:01,219 --> 00:28:03,079
support encryption of backups?

539
00:28:03,079 --> 00:28:03,199
Is

540
00:28:03,199 --> 00:28:03,539
that?

541
00:28:04,164 --> 00:28:07,404
we have out of the box,
the snapshots, the backups.

542
00:28:07,434 --> 00:28:09,164
we talk in terms of snapshots, in terms of

543
00:28:09,914 --> 00:28:12,594
a snapshot is a view of what the,
whatever you imported as data,

544
00:28:13,264 --> 00:28:16,084
all these snapshots, they are
compressed and encrypted by default.

545
00:28:16,404 --> 00:28:19,084
You have to actually, say,
I don't want the encryption.

546
00:28:19,164 --> 00:28:19,894
I want to work with plaintext

547
00:28:20,159 --> 00:28:20,689
have to turn it off.

548
00:28:20,729 --> 00:28:21,269
Oh, okay.

549
00:28:21,349 --> 00:28:21,929
That's nice.

550
00:28:22,269 --> 00:28:22,979
Secure by default.

551
00:28:22,979 --> 00:28:23,429
I love it.

552
00:28:23,929 --> 00:28:27,809
it's end to end encrypted,
so you don't have, you don't

553
00:28:27,809 --> 00:28:28,919
have a server, for example.

554
00:28:29,189 --> 00:28:33,569
you're going to run it from your machine,
and you're going to say, my import bucket

555
00:28:33,729 --> 00:28:36,969
is on S3, and my storage is on gcloud.

556
00:28:37,119 --> 00:28:37,269
yeah.

557
00:28:37,944 --> 00:28:38,324
Yeah.

558
00:28:38,679 --> 00:28:43,719
you don't have a server running at AWS and
you don't have a server running at gcloud.

559
00:28:44,219 --> 00:28:47,899
So all of this setup is
stored in the configuration

560
00:28:47,939 --> 00:28:48,999
of the store that you create.

561
00:28:48,999 --> 00:28:50,219
It is standalone.

562
00:28:50,719 --> 00:28:54,739
And we don't want to trust
AWS or gcloud with keys.

563
00:28:55,079 --> 00:28:57,889
And we don't have a third party
that would hold the keys to

564
00:28:57,889 --> 00:29:00,109
encrypt, decrypt with the strip.

565
00:29:00,559 --> 00:29:04,319
So we act with them as if they
were what we call dumpsters.

566
00:29:04,359 --> 00:29:08,029
They don't do anything besides passing
packets that they see and storing

567
00:29:08,069 --> 00:29:09,389
them, that kind of storage layer.

568
00:29:09,889 --> 00:29:10,129
Yeah.

569
00:29:10,129 --> 00:29:12,619
So at the lowest level, what if I do it?

570
00:29:12,619 --> 00:29:15,739
If I do the simplest thing, if I do
the simplest deployment because I'm

571
00:29:15,739 --> 00:29:19,219
an imp, I'm like, I'm primarily an
implementer, you know, an operator.

572
00:29:19,459 --> 00:29:20,449
So I often think about, okay.

573
00:29:21,034 --> 00:29:23,014
is it going to look like when
I have this thing set up?

574
00:29:23,024 --> 00:29:26,774
what were the pieces of the puzzle
sitting and what do I have to run

575
00:29:26,774 --> 00:29:31,044
long term and what are the ports I
need and how do these things connect?

576
00:29:31,044 --> 00:29:34,194
So I'm guessing that there's a daemon
or I don't know what you're calling

577
00:29:34,194 --> 00:29:38,534
it, the server part, but like a daemon
that runs somewhere all the time and

578
00:29:38,534 --> 00:29:42,674
it has an, like an API that I'm like,
I can use a local CLI to control it.

579
00:29:42,724 --> 00:29:43,894
there's two parts to it.

580
00:29:43,914 --> 00:29:48,224
There's the, let's say the client part,
which is you running Plakar to import

581
00:29:48,234 --> 00:29:50,694
data from a source and push it somewhere.

582
00:29:51,194 --> 00:29:55,394
And there's whatever storage you
have, which may be a local disk or

583
00:29:55,394 --> 00:29:58,754
which may be an S3 bucket or which
may be actually anything that,

584
00:29:58,874 --> 00:30:02,228
that can actually, take a key
value, object, store, yeah.

585
00:30:02,728 --> 00:30:05,858
So that would be AWS and you
don't have a server in between.

586
00:30:05,958 --> 00:30:10,418
You have Plakar operating as a client
to, to your AWS bucket, for example.

587
00:30:11,758 --> 00:30:15,178
so everything from the duplication,
compression, encryption is done on the

588
00:30:15,178 --> 00:30:16,788
client side on the machine running Plakar.

589
00:30:16,788 --> 00:30:21,268
So when the traffic leaves the
machine, you know that it can be

590
00:30:21,308 --> 00:30:23,338
tampered with without being detected.

591
00:30:23,338 --> 00:30:27,073
it can be decrypted without having the
keys that have not left your machine.

592
00:30:27,573 --> 00:30:28,083
yeah.

593
00:30:28,283 --> 00:30:29,883
that's the most simple case.

594
00:30:29,883 --> 00:30:31,673
That's what you would
do, on your home setup.

595
00:30:31,743 --> 00:30:34,483
You would install Plakar on your
desktop and you would run it from the

596
00:30:34,673 --> 00:30:36,563
desktop saying the storage is there.

597
00:30:36,573 --> 00:30:38,073
It's on my S3.

598
00:30:38,593 --> 00:30:41,973
Then you have a different mode, which
is, through the integration, you could

599
00:30:41,983 --> 00:30:45,583
have a server, you could have the
Ubuntu machine you said you had on your,

600
00:30:45,593 --> 00:30:48,823
in your closet that could be running
Plakar and taking care of connecting

601
00:30:48,873 --> 00:30:52,173
through SFTP to all the machines on
your network and doing the backup.

602
00:30:52,423 --> 00:30:55,783
Then we have the non open source
version, which we're working on,

603
00:30:55,783 --> 00:31:00,213
Plakar Enterprise, which provides
a server that extends Plakar.

604
00:31:00,213 --> 00:31:03,353
So Plakar becomes the open
source client to an enterprise.

605
00:31:03,648 --> 00:31:09,648
Product, but same tool as you would
use at home and the enterprise version

606
00:31:09,658 --> 00:31:13,488
would provide a server that has
additional features, like maintaining,

607
00:31:13,528 --> 00:31:16,838
privacy of the credentials for
all of your storages, for example.

608
00:31:17,338 --> 00:31:21,278
So your clients at home would connect
to your, your client on the workstation

609
00:31:21,278 --> 00:31:24,948
at work, yeah, they would connect to
the Plakar server of your enterprise.

610
00:31:25,448 --> 00:31:28,453
And then that server would hold,
the credentials, to the actual

611
00:31:28,453 --> 00:31:30,833
storages, to not link them
through the company, for example.

612
00:31:31,333 --> 00:31:31,543
So you

613
00:31:31,543 --> 00:31:35,363
have all these different ways of
working that allows you to have very

614
00:31:35,363 --> 00:31:39,183
flexible setups that go from, yeah,
I have a mono machine and it's going

615
00:31:39,183 --> 00:31:41,223
to connect directly to my store too.

616
00:31:41,223 --> 00:31:41,583
I have.

617
00:31:42,238 --> 00:31:46,218
segregated traffic and isolated
machine that have different privileges

618
00:31:46,218 --> 00:31:49,438
and cannot access that S3 bucket,
but they could access this one.

619
00:31:49,758 --> 00:31:51,358
And I can't trust them to do that.

620
00:31:51,408 --> 00:31:56,558
So I have to have some layer of
validation, as it's at a enterprise level.

621
00:31:56,698 --> 00:31:56,848
Yeah.

622
00:31:57,138 --> 00:31:57,658
Yeah.

623
00:31:58,028 --> 00:31:59,168
is that allowing for like.

624
00:32:00,088 --> 00:32:04,108
Multihop backups or, I'm trying to
think of some of my more enterprise

625
00:32:04,138 --> 00:32:08,158
challenging, like we had so many
backups happening that, you know,

626
00:32:08,158 --> 00:32:10,018
one server couldn't do them all.

627
00:32:10,518 --> 00:32:15,138
So for bandwidth purposes, so we
ended up with, in one scenario where

628
00:32:15,138 --> 00:32:16,878
there was like a main orchestrator.

629
00:32:17,378 --> 00:32:24,458
Server, and it had multiple backup, we
just call them agent boxes, but they,

630
00:32:24,548 --> 00:32:29,678
their purpose was to back up the data and
create the snapshots, but the, but they

631
00:32:29,678 --> 00:32:30,868
weren't necessarily backing up themselves.

632
00:32:30,868 --> 00:32:35,788
They were backing up other machines that
also had agents, but the middle tier

633
00:32:35,908 --> 00:32:40,997
of fan out.

634
00:32:41,038 --> 00:32:43,968
need to back up certain amount
of terabytes every 24 hours.

635
00:32:43,988 --> 00:32:47,598
if at the time, this was 20 years
ago, but at the time we were limited

636
00:32:47,598 --> 00:32:49,078
to one gigabit network connections.

637
00:32:49,078 --> 00:32:53,778
So we were literally creating new
servers in the middle tier because we

638
00:32:53,778 --> 00:32:57,528
were saturating pipes and we couldn't
get enough backups from all the

639
00:32:57,528 --> 00:32:59,398
different systems in a 24 hour period.

640
00:32:59,398 --> 00:33:02,518
So we had to add more middle tier,
but there needed to be a central

641
00:33:02,518 --> 00:33:06,298
orchestrator that managed the
jobs that it was distributing to

642
00:33:06,298 --> 00:33:08,228
the individual middle tier stuff.

643
00:33:08,358 --> 00:33:12,268
But there, you know, there's a lot
of small shops that I deal with where

644
00:33:12,608 --> 00:33:18,948
one person is saddled with DevOps,
and ops, and backups, and recovery,

645
00:33:18,948 --> 00:33:22,758
and like monitoring, and logging, and
storage, and cloud infrastructure.

646
00:33:22,758 --> 00:33:24,408
Like they're just having to do it all.

647
00:33:24,438 --> 00:33:26,328
I actually call them solo DevOps.

648
00:33:26,418 --> 00:33:30,128
that's the label I give to these
unfortunate individuals that are

649
00:33:30,128 --> 00:33:31,968
given way too much, work to do.

650
00:33:32,318 --> 00:33:33,328
Maybe AI will help them.

651
00:33:33,328 --> 00:33:36,218
maybe we can, rely a little bit more on
AI to help us with the advice on that.

652
00:33:36,218 --> 00:33:37,748
But it's just, it ends
up being a whole lot.

653
00:33:37,798 --> 00:33:38,188
Right.

654
00:33:38,218 --> 00:33:41,458
And I saw that you had a demo on the
website, but yeah, I'm just curious

655
00:33:41,458 --> 00:33:43,718
about how big does this get today?

656
00:33:43,718 --> 00:33:46,448
And like, where's your vision
for where the enterprise product

657
00:33:46,448 --> 00:33:47,528
that you're building is going?

658
00:33:48,028 --> 00:33:53,988
now, ransomware attacks are
first targeting the backup system

659
00:33:53,988 --> 00:33:55,388
of, you know, any companies.

660
00:33:55,888 --> 00:34:00,878
So, for different kind of
reason, encryption is required.

661
00:34:01,128 --> 00:34:06,048
And you need to be sure that your storage
And the backup server doesn't have

662
00:34:06,088 --> 00:34:09,458
your credential of the encryption key.

663
00:34:09,928 --> 00:34:16,908
Otherwise, if your backup system is
falling, at some point, the attackers,

664
00:34:17,038 --> 00:34:19,198
they have all your data in one.

665
00:34:19,378 --> 00:34:21,848
So, end to end encryption is key.

666
00:34:22,348 --> 00:34:25,318
Clearly becoming a kind of
prerequisite for, you know,

667
00:34:25,338 --> 00:34:26,508
securing your backup right now.

668
00:34:27,008 --> 00:34:32,008
If we step back a bit you know, the
issues that you mentioned about the

669
00:34:32,018 --> 00:34:36,338
size of the backup, we solved that
in the past with deduplication,

670
00:34:36,498 --> 00:34:39,468
mainly on the, on the, filers.

671
00:34:39,498 --> 00:34:43,228
So basically it was the storage
that was, optimizing the space.

672
00:34:43,868 --> 00:34:49,308
Duplicating the data, but it's
work only with unencrypted data

673
00:34:49,308 --> 00:34:53,128
because with Uncrypt data, you are
of course losing, the duplication.

674
00:34:53,528 --> 00:34:58,438
So today we are in, in a situation
where we have companies that.

675
00:34:58,873 --> 00:35:03,923
want to have end to end encryption
on their data, but, they cannot,

676
00:35:04,113 --> 00:35:07,778
you know, in that case, using
the deduplication of the filers.

677
00:35:08,278 --> 00:35:13,558
And so storing the backup will have
a crazy cost to make it happen.

678
00:35:13,988 --> 00:35:20,132
A lot of, vendors basically created kind
of alternative with proprietary formats.

679
00:35:20,413 --> 00:35:25,073
where they are still optimizing
the space, but at the end they

680
00:35:25,073 --> 00:35:26,803
have still the encryption key.

681
00:35:27,193 --> 00:35:30,553
And what we are trying to do with
Plakar is to solve this issue.

682
00:35:30,553 --> 00:35:35,723
So basically, because we are doing the
encryption, the compression, and the

683
00:35:35,723 --> 00:35:42,723
duplication at source, it means that,
all around the path where the backups are

684
00:35:42,743 --> 00:35:47,293
going, they are already super optimized
in terms of storage and in terms of space.

685
00:35:47,843 --> 00:35:52,843
we have, almost, 15, 000, cycle
of backups that we did, snapshot

686
00:35:52,853 --> 00:35:54,893
that we did, on this machine.

687
00:35:55,393 --> 00:35:57,973
the logical size is 24 terabytes.

688
00:35:57,983 --> 00:36:02,973
So, we I have here a huge amount
of data, but the space that we

689
00:36:02,973 --> 00:36:06,233
are using is only 159 gigabytes.

690
00:36:06,263 --> 00:36:10,793
Even if everything is fully encrypted
and unencrypted, the storage has no

691
00:36:10,793 --> 00:36:13,303
knowledge about the encryption key.

692
00:36:13,803 --> 00:36:18,483
And I think It's a game changing thing
of this technology, because it allows you

693
00:36:18,483 --> 00:36:22,513
to move your backup everywhere you want.

694
00:36:23,013 --> 00:36:27,773
You know, in any cloud provider,
on premise, where you want, even if

695
00:36:27,813 --> 00:36:30,213
you don't fully trust this provider.

696
00:36:30,713 --> 00:36:35,413
And, but you can do that with an
optimized, network cost because, you

697
00:36:35,413 --> 00:36:39,403
know, sometimes if you want to synchronize
data between cloud provider, you have

698
00:36:39,403 --> 00:36:43,553
to pay egress cost, which is super
expensive with huge amount of data.

699
00:36:44,053 --> 00:36:49,033
And because you, with Plaka, you will
just pay that at the first, for the

700
00:36:49,033 --> 00:36:50,883
first backups that you are doing.

701
00:36:51,363 --> 00:36:55,433
And all the snapshots that will follow
will only transfer to your server.

702
00:36:55,898 --> 00:36:58,628
the few blocks that were
not backed up before.

703
00:36:58,898 --> 00:37:04,108
So the storage optimized, and
it's fully end to end encrypted.

704
00:37:04,638 --> 00:37:08,038
today, I don't know so much, option
to make that happen right now.

705
00:37:08,538 --> 00:37:08,938
Yeah.

706
00:37:09,438 --> 00:37:12,288
So it's doing incremental
backups after the first, or

707
00:37:12,318 --> 00:37:13,648
incremental snapshots, I guess.

708
00:37:13,878 --> 00:37:14,198
there's,

709
00:37:14,198 --> 00:37:15,178
I will let Gilles, but

710
00:37:15,228 --> 00:37:16,338
we differential or incremental?

711
00:37:16,838 --> 00:37:16,978
Yeah.

712
00:37:17,038 --> 00:37:21,418
when you have a, an incremental
backup, you actually create a chain of

713
00:37:21,858 --> 00:37:23,808
dependency between all your snapshots.

714
00:37:24,688 --> 00:37:29,868
The thing is, you get as, the
more your chain goes without going

715
00:37:30,098 --> 00:37:32,238
through another day zero of sync.

716
00:37:32,583 --> 00:37:34,783
The more you increase the
likelihood that you will have a

717
00:37:34,783 --> 00:37:36,083
corruption at some point that will

718
00:37:36,093 --> 00:37:36,753
break your thing.

719
00:37:36,763 --> 00:37:37,723
And you, Yeah.

720
00:37:38,233 --> 00:37:41,393
so you have a higher risk and you
have to test everything very often

721
00:37:41,403 --> 00:37:43,193
because you want to limit that risk.

722
00:37:43,233 --> 00:37:46,293
Like you, you don't want to do, to
go through the hassle of doing the

723
00:37:46,403 --> 00:37:50,533
incremental backup, just to not test
it and test it in a week and realize

724
00:37:50,543 --> 00:37:52,313
that, oh, you have one week worth of.

725
00:37:52,478 --> 00:37:54,598
Deltas that are trashed, basically.

726
00:37:55,398 --> 00:37:55,778
Yeah.

727
00:37:55,848 --> 00:38:01,765
And the idea is that, you can also take
an approach that is a index reference

728
00:38:01,775 --> 00:38:06,045
based, where basically what you're
doing is not saying I'm building a delta

729
00:38:06,085 --> 00:38:10,565
against what happened right before, it's
building a delta against what's in the

730
00:38:10,575 --> 00:38:14,525
store as a global storage repository.

731
00:38:15,025 --> 00:38:19,475
So, Your backup actually benefits
from any of the previous ones doing

732
00:38:19,475 --> 00:38:22,345
anything, and you don't have a chain
of dependency in the sense that you can

733
00:38:22,345 --> 00:38:23,855
delete the one that happened yesterday.

734
00:38:24,035 --> 00:38:26,605
It's not going to break any dependency
with the one you have today.

735
00:38:27,105 --> 00:38:31,685
as long as your store is, reliable,
you can do any kind of removal that you

736
00:38:31,685 --> 00:38:33,315
want with the granularity that you want.

737
00:38:33,315 --> 00:38:36,575
And we can consider them as being,
autonomous in the sense that each

738
00:38:36,575 --> 00:38:38,300
snapshot is, autonomous by itself.

739
00:38:38,440 --> 00:38:39,920
Does not require any other one.

740
00:38:40,530 --> 00:38:42,850
the thing is you have to
trust the storage anyways.

741
00:38:43,670 --> 00:38:44,990
You're going to store your data there.

742
00:38:45,050 --> 00:38:45,240
If you

743
00:38:45,240 --> 00:38:48,790
don't trust it, well, you have to do
something that's called three to one

744
00:38:48,830 --> 00:38:52,720
backup to actually ensure you don't
have one copy of your backup and you can

745
00:38:52,790 --> 00:38:54,920
restore your broken backup
from another backup.

746
00:38:55,500 --> 00:38:56,090
that's the idea.

747
00:38:56,675 --> 00:38:57,865
we have a cool way to manage it.

748
00:38:58,365 --> 00:39:02,345
And so this allows you to have all
the benefits of incremental backups

749
00:39:02,385 --> 00:39:04,735
without the risk of incremental backup.

750
00:39:05,285 --> 00:39:05,495
Yeah.

751
00:39:05,495 --> 00:39:06,085
That's nice.

752
00:39:07,160 --> 00:39:13,270
with the sync command right now, you
can actually super easily, synchronize

753
00:39:13,570 --> 00:39:16,000
a ClosetStore in several locations.

754
00:39:16,010 --> 00:39:19,450
So basically you are pushing
one backup in a ClosetStore.

755
00:39:19,950 --> 00:39:24,040
And you can have two, three closet
stores that are, replicating

756
00:39:24,080 --> 00:39:25,820
those data in different locations.

757
00:39:26,280 --> 00:39:30,100
So, yeah, for the, having all
your backup in one closet.

758
00:39:30,860 --> 00:39:31,790
will be too risky.

759
00:39:31,800 --> 00:39:34,520
Of course, you need to have a
backup strategy on top of it.

760
00:39:35,030 --> 00:39:38,790
And we are providing a cool way to
make it by, you know, having this

761
00:39:38,790 --> 00:39:44,200
way to synchronize, with a again, low
cost on storage and bandwidth, this

762
00:39:44,200 --> 00:39:45,760
closed store in several locations.

763
00:39:46,080 --> 00:39:49,900
So you are pushing in one place, and
you are able to have two, three copies,

764
00:39:50,290 --> 00:39:54,680
even in cold storage, to be sure
that your data, are remaining safe.

765
00:39:54,870 --> 00:39:57,050
If the first storage
has some issue, but yes,

766
00:39:57,050 --> 00:40:00,580
with different granularities, because,
you might say, oh, since it's encrypted,

767
00:40:00,650 --> 00:40:03,360
you may, you need to have the exact
same copy, but that's not happening

768
00:40:03,360 --> 00:40:05,430
because the snapshots are individual.

769
00:40:05,820 --> 00:40:09,820
So you can actually, say, oh, I have one
store, like on my NAS near my machine.

770
00:40:10,230 --> 00:40:14,800
I will back up on my machine in the
local disk, just to have user error

771
00:40:14,810 --> 00:40:18,420
reparation, because I did it something,
I have it immediately available.

772
00:40:18,420 --> 00:40:22,790
But I might synchronize one snapshot
per hour to the NAS and have that

773
00:40:22,820 --> 00:40:27,752
one, span again, like one copy into
AWS and one copy into Google cloud,

774
00:40:27,752 --> 00:40:31,800
for example, and the synchronization,
it's not doing another backup.

775
00:40:31,920 --> 00:40:35,420
It's really pushing a copy of the
snapshots through different, sources

776
00:40:35,540 --> 00:40:37,770
Possibly have different,
encryption keys as well.

777
00:40:38,270 --> 00:40:42,360
it does, like trans, transformation
between one, one to the other so

778
00:40:42,360 --> 00:40:46,590
that in the end, each one has its
own encrypted copy of the same data.

779
00:40:46,645 --> 00:40:50,285
and, this saves from cases where, for
example, you would have your machine,

780
00:40:50,285 --> 00:40:52,085
you want to back it up to two places.

781
00:40:52,085 --> 00:40:52,325
Yeah.

782
00:40:52,505 --> 00:40:53,765
You would run like natively.

783
00:40:53,765 --> 00:40:57,095
You would do, oh, I'm going to do
a backup to AWS and I'm going to

784
00:40:57,095 --> 00:40:58,835
do a backup to, to Google Cloud.

785
00:40:58,835 --> 00:40:58,955
Yeah.

786
00:40:59,375 --> 00:41:00,425
But in between.

787
00:41:00,700 --> 00:41:01,740
Something may have changed.

788
00:41:01,740 --> 00:41:03,100
You're not backing up the same thing.

789
00:41:03,600 --> 00:41:08,210
When you're doing the sync, what you're
doing is getting the info from one of the

790
00:41:08,220 --> 00:41:09,850
stores and transferring it to the other.

791
00:41:09,850 --> 00:41:13,420
So at the end, they have the same
data, which has its benefits.

792
00:41:14,280 --> 00:41:16,130
like if you lose
something, it has benefits.

793
00:41:16,495 --> 00:41:18,405
And we repair the store,
also, if we have a

794
00:41:18,405 --> 00:41:18,995
corruption,

795
00:41:19,130 --> 00:41:20,340
repairing the stores as well.

796
00:41:20,340 --> 00:41:20,550
if

797
00:41:20,550 --> 00:41:23,220
you break something, you can actually
repair it from the other one.

798
00:41:23,720 --> 00:41:23,960
yeah.

799
00:41:24,170 --> 00:41:28,020
And you can also, of course, run some
check on the store to check, you know,

800
00:41:28,020 --> 00:41:33,420
if the data is still, what you expect,
in the store on two different way, right?

801
00:41:33,920 --> 00:41:38,250
And we have R&D, projects about,
error correcting codes to auto repair,

802
00:41:38,360 --> 00:41:39,440
maintenances and stuff like that.

803
00:41:39,950 --> 00:41:43,250
just wish to be clear, the crypto,
we did not do it ourselves.

804
00:41:43,310 --> 00:41:46,630
it's We are a team of people who
have worked in security a lot.

805
00:41:46,960 --> 00:41:50,710
We have been facing specs a lot
about crypto in banking and stuff.

806
00:41:50,710 --> 00:41:55,100
So we kind of had a hunch about
what should be done on where.

807
00:41:55,410 --> 00:41:59,605
And we had an external, independent
auditor with a famous corp. Cryptographer,

808
00:41:59,615 --> 00:42:03,945
where the book I have behind myself,
was, okay to actually audit this with

809
00:42:03,965 --> 00:42:07,145
no buyout because it has no interest
validating something that will be broken.

810
00:42:07,545 --> 00:42:09,305
So that was, just to have a third party.

811
00:42:09,355 --> 00:42:12,875
We managed to put cryptography in
every layer as validation concept.

812
00:42:13,265 --> 00:42:17,075
You have HMAC everywhere, so if you
flip one bit somewhere, it's going to

813
00:42:17,885 --> 00:42:21,165
completely break in the nice sense.

814
00:42:21,175 --> 00:42:23,175
It's going to tell you
there's a corruption there.

815
00:42:23,365 --> 00:42:26,915
It's in that specific file, and this
is collapsing because there's that

816
00:42:26,915 --> 00:42:29,975
file and that file that also shared
that data, so they are all corrupted.

817
00:42:30,245 --> 00:42:34,173
So we have already the detection
part in a very, granular way, in

818
00:42:34,173 --> 00:42:36,895
the sense that it can pinpoint
very specific chunks and objects.

819
00:42:37,345 --> 00:42:41,165
And having that plus the ability to
synchronize, we can, build upon, tools

820
00:42:41,165 --> 00:42:46,175
that would, that we are allowed to do
a very, like pinpointed reparation.

821
00:42:46,490 --> 00:42:49,370
Without having to repair everything
because that's costly too, we're going

822
00:42:49,370 --> 00:42:52,400
to be able to say, Oh, I have one chunk
it's broken and I can fetch it from there.

823
00:42:52,400 --> 00:42:54,560
I'm going to fetch just
that, that amount of data.

824
00:42:55,020 --> 00:42:58,470
We have, well, as I said, the error
correcting codes, because since we

825
00:42:58,470 --> 00:43:02,110
can detect all that is broken, we
can have on top error correcting

826
00:43:02,110 --> 00:43:04,294
code that could repair, auto repair.

827
00:43:04,295 --> 00:43:08,435
You know, in the same way, like repair
in a buffer, verify that it's correct.

828
00:43:08,435 --> 00:43:09,885
You can check with other repository.

829
00:43:09,915 --> 00:43:10,835
Oh, it's very correct.

830
00:43:10,885 --> 00:43:11,315
So Yeah.

831
00:43:11,345 --> 00:43:13,205
do repair for real, apply this.

832
00:43:13,705 --> 00:43:16,965
So we have all these paths of,
possibilities that we can implement

833
00:43:17,265 --> 00:43:20,835
that are not like that far away because
we have branches that are working.

834
00:43:20,835 --> 00:43:21,734
They're not prod branches.

835
00:43:21,805 --> 00:43:22,385
Working right.

836
00:43:22,385 --> 00:43:22,575
now,

837
00:43:22,885 --> 00:43:26,655
but they're working enough that you can
actually say, it's not just, an idea.

838
00:43:26,655 --> 00:43:29,745
It's something that you can actually,
that was the focus, next month

839
00:43:29,895 --> 00:43:32,425
it would be there because there's
enough, enough bricks to prove that.

840
00:43:33,255 --> 00:43:33,775
And, and we,

841
00:43:33,785 --> 00:43:37,005
have a ton of these ideas of, like
what would be the tools to make it

842
00:43:37,085 --> 00:43:40,485
more reliable, in the sense that it's
reliable that you would detect something

843
00:43:40,485 --> 00:43:42,045
is corrupted, but how can you make it.

844
00:43:42,545 --> 00:43:46,165
so reliable that people will
not be stressed if that happens?

845
00:43:46,540 --> 00:43:48,100
But that's the goal in the, in the idea.

846
00:43:48,450 --> 00:43:52,510
I've been, as an architect in the past,
I've been in so many incidents room over

847
00:43:52,510 --> 00:43:56,600
Slack with people that are, they lose
their mind when they're have, they have

848
00:43:56,600 --> 00:43:58,620
an incident and the backups are hard to

849
00:43:58,620 --> 00:43:59,070
manage.

850
00:43:59,490 --> 00:44:01,390
Cause that's not, we know we have backups.

851
00:44:01,650 --> 00:44:04,070
Now we have to go into the
backups and we never do that.

852
00:44:04,370 --> 00:44:06,610
So now we have to figure
out how we go into that.

853
00:44:06,850 --> 00:44:11,975
how we, and if, One of them is corrupted,
then, yeah, it's, it's, stress plus plus.

854
00:44:12,615 --> 00:44:14,375
You're going to a high level of stress.

855
00:44:14,975 --> 00:44:15,355
And we want

856
00:44:15,405 --> 00:44:17,805
to be into the session where
they don't do not face that.

857
00:44:17,835 --> 00:44:18,485
they, okay.

858
00:44:18,485 --> 00:44:20,055
There's a corruption, even your backup.

859
00:44:20,105 --> 00:44:21,695
Well, there are ways to get out of this.

860
00:44:21,695 --> 00:44:23,005
And most of them are automated.

861
00:44:23,055 --> 00:44:28,885
As you manage backups, like there's these
three phases of the DevOps, the operations

862
00:44:28,885 --> 00:44:32,195
engineer that's managing backups, there's
the implementation, which is obviously

863
00:44:32,195 --> 00:44:35,385
very time consuming, and you're learning
the product, and you're testing backup and

864
00:44:35,435 --> 00:44:37,355
restore, so you can believe they'll work.

865
00:44:37,765 --> 00:44:41,235
And then once you kind of get there
to your, the projects implemented,

866
00:44:41,255 --> 00:44:43,535
and you feel like everything's
going to work in a recovery.

867
00:44:43,950 --> 00:44:45,130
You tend to leave it alone, right?

868
00:44:45,130 --> 00:44:47,960
Like you're checking to make sure
things are going like, as new

869
00:44:47,960 --> 00:44:51,040
infrastructure shows up, you're
adding or removing jobs or whatever.

870
00:44:51,450 --> 00:44:55,110
And so you're kind of in maintenance
mode, but then there is that incident day.

871
00:44:55,435 --> 00:44:59,695
Where they call the backup person
and they're like, okay, we need to

872
00:44:59,695 --> 00:45:02,645
bring you into the incident room
or into the Slack team or whatever,

873
00:45:02,945 --> 00:45:05,065
because we now need a recovery.

874
00:45:05,105 --> 00:45:08,135
And typically, most of the teams
I work in, like not everyone

875
00:45:08,135 --> 00:45:09,105
that can restore, right?

876
00:45:09,105 --> 00:45:12,245
There, there's only one or two
people that can restore tool.

877
00:45:12,275 --> 00:45:13,925
And so in that moment.

878
00:45:14,215 --> 00:45:18,315
As I can viscerally remember
being the manager of the people

879
00:45:18,325 --> 00:45:22,865
managing the backups and worried,
starting to doubt everything, right?

880
00:45:22,865 --> 00:45:26,485
Like they're about to test the restore
and I'm doubting like, did we, when

881
00:45:26,485 --> 00:45:29,605
was the last time we verified this
type of this particular integration?

882
00:45:29,615 --> 00:45:32,685
Like we've had three major version
upgrades and we've never tested

883
00:45:32,685 --> 00:45:34,315
since we did the initial deployment.

884
00:45:34,315 --> 00:45:36,135
So we don't even know if
this restore will work.

885
00:45:36,385 --> 00:45:39,045
We recently had to replace
three of the drives in that.

886
00:45:39,970 --> 00:45:44,260
So, is there a potential for some
sort of disk corruption that we

887
00:45:44,260 --> 00:45:47,500
didn't know about because the files
just sit there and they never get

888
00:45:47,510 --> 00:45:49,210
touched and they die slowly over time?

889
00:45:49,380 --> 00:45:52,910
there's so many, moments in that where
I'm worried that someone's going to

890
00:45:52,910 --> 00:45:54,500
get in a lot of trouble or fired.

891
00:45:54,740 --> 00:45:57,160
And then the recovery
happens and it works.

892
00:45:57,490 --> 00:45:58,310
Maybe it doesn't.

893
00:45:58,510 --> 00:46:00,970
there was one time where we
ended up having corrupted files.

894
00:46:01,395 --> 00:46:06,115
And we had to go to offsite tape from
like a month ago, because we had this

895
00:46:06,115 --> 00:46:09,855
process where we would go to tape once
a month that would go to an offsite,

896
00:46:09,865 --> 00:46:13,455
it was going to a different data center
in a different part of the state.

897
00:46:13,775 --> 00:46:17,915
So it was like a 300 mile, the goal was
that no storm, if the storm took out the

898
00:46:17,915 --> 00:46:22,325
data center, that's the, of the three,
two, one, that's the third copy, right?

899
00:46:22,335 --> 00:46:23,665
Like it's a state away.

900
00:46:23,920 --> 00:46:26,080
It's been driven there
by one of our staff.

901
00:46:26,080 --> 00:46:27,580
We know that it's physically there.

902
00:46:27,860 --> 00:46:30,120
And we had to go pick those tapes
up and they actually worked.

903
00:46:30,120 --> 00:46:31,390
But It took like a week.

904
00:46:31,660 --> 00:46:35,850
It was after a hurricane and we had a
flood and we had servers underwater.

905
00:46:36,150 --> 00:46:40,790
And so we had to go to the offsite
storage, And that whole week, I was

906
00:46:40,790 --> 00:46:43,410
just so nervous that these things
weren't going to get restored.

907
00:46:43,680 --> 00:46:48,240
We were like, basically going to
start from six month old data at best.

908
00:46:48,560 --> 00:46:50,050
And luckily, luckily it worked.

909
00:46:51,240 --> 00:46:55,040
But those kind of things, we don't talk
about those kind of horror stories enough.

910
00:46:55,120 --> 00:46:55,550
you know,

911
00:46:56,905 --> 00:47:01,344
one of the reasons I think people are
stressed is because, partly most of the

912
00:47:01,425 --> 00:47:02,635
companies, they don't have a backup team.

913
00:47:02,905 --> 00:47:04,885
They like the big ones have a backup team.

914
00:47:04,945 --> 00:47:05,245
The other

915
00:47:05,245 --> 00:47:06,044
ones have, they don't have, a backup team.

916
00:47:06,525 --> 00:47:10,665
People who are given the task to
do backups, it falls on them, as

917
00:47:10,665 --> 00:47:13,635
part of a long list of other things
to do, and they have to get rid

918
00:47:13,765 --> 00:47:17,435
of it, fairly fast and it's not a
topic that they are interested in.

919
00:47:17,565 --> 00:47:18,585
they just like, yeah,

920
00:47:18,585 --> 00:47:20,395
you have to do backup
before, before Friday.

921
00:47:20,395 --> 00:47:20,685
Okay.

922
00:47:20,715 --> 00:47:21,215
what do I have?

923
00:47:21,715 --> 00:47:23,975
There's only enough,
there's a list of 10 tools.

924
00:47:24,015 --> 00:47:25,585
None of them seems appealing.

925
00:47:25,635 --> 00:47:26,335
I'm going to take.

926
00:47:26,520 --> 00:47:29,830
One that's popular because no one's
going to get fired over a popular tool.

927
00:47:29,880 --> 00:47:31,910
That's going to be the decision driving.

928
00:47:32,230 --> 00:47:36,860
but then, if they don't have to use
these backups and if it was like a task

929
00:47:36,900 --> 00:47:40,560
following up on them, they're not going to
have a look at the backup once it's done.

930
00:47:40,640 --> 00:47:43,590
Like they will check that it happens,
on the regular basis, because

931
00:47:43,590 --> 00:47:44,730
it's supposed to run every day.

932
00:47:44,730 --> 00:47:46,870
Well, they will check that it
happens every day, but they will

933
00:47:46,870 --> 00:47:48,390
not inspect the data every day.

934
00:47:48,440 --> 00:47:49,980
Cause that's, they have
other things to do here.

935
00:47:50,690 --> 00:47:55,010
The other thing is that in most tools, the
backups are kind of dead data in the sense

936
00:47:55,020 --> 00:47:59,410
that they are meant to be backed up and
no other use than being backups, you know,

937
00:47:59,880 --> 00:48:03,370
when we design stuff, we're more
interested in how do you actually

938
00:48:03,370 --> 00:48:06,540
use the data, because what's going
to happen is you're going If you

939
00:48:06,540 --> 00:48:09,680
have no use for data, for that data,
you're not going to look into it.

940
00:48:10,180 --> 00:48:14,420
If that data that you backed up is
actually usable in a very usable

941
00:48:14,420 --> 00:48:18,160
way, and you actually use it every
day, then you have a fair confidence

942
00:48:18,280 --> 00:48:21,120
it's not corrupted because you've
been using it the last few days.

943
00:48:21,330 --> 00:48:24,550
The demo website, that's just
the open source version, okay?

944
00:48:24,550 --> 00:48:29,260
So that's not a company use that you would
have of it, but we have previews of files.

945
00:48:29,760 --> 00:48:32,330
Within these files, you can preview
the photos, but you can preview

946
00:48:32,330 --> 00:48:33,720
the videos, you can preview audio.

947
00:48:34,020 --> 00:48:37,170
If you actually use that snapshot,
which is a backup, it's a backup

948
00:48:37,170 --> 00:48:40,340
stored on the screen, for example,
you actually use it in a way that

949
00:48:40,340 --> 00:48:41,720
you would use your Google Drive.

950
00:48:42,220 --> 00:48:45,120
Every day, looking into things that
you actually manage and, Oh, I want

951
00:48:45,120 --> 00:48:48,420
to look at the content of a file,
but I'm going to use a snapshot, not

952
00:48:48,460 --> 00:48:51,180
a copy that I have on my machine.

953
00:48:51,680 --> 00:48:54,260
Well, you know that it works because
you actually viewed it, recently.

954
00:48:54,260 --> 00:48:54,460
yeah.

955
00:48:54,910 --> 00:48:58,390
And it becomes immutable data in
the sense that you can't alter it.

956
00:48:58,480 --> 00:49:02,240
it's like a read only data, but it's
read only data that's, that's you

957
00:49:02,240 --> 00:49:07,590
actually use and it makes that the data
a bit less dead and a bit more lively.

958
00:49:08,090 --> 00:49:11,680
I think if you have a use case that
way, you enter into an incident, you

959
00:49:11,680 --> 00:49:15,090
have to restore something that you've
been looking, like you've been actually

960
00:49:15,090 --> 00:49:18,760
using the snapshot every day through
a web interface or through mounting

961
00:49:18,760 --> 00:49:20,980
on your system as a local directory.

962
00:49:21,480 --> 00:49:26,620
Well, you're not as stressed because you
know that, that works actually, which is,

963
00:49:26,730 --> 00:49:31,110
you've removed the, the painful part of
the question of checking your restores.

964
00:49:31,610 --> 00:49:32,050
Yeah.

965
00:49:32,280 --> 00:49:35,740
is the check similar to
a, like mock restore?

966
00:49:35,750 --> 00:49:36,060
Yeah.

967
00:49:36,060 --> 00:49:40,110
It's, it's an in memory restore
that discards the data after doing?

968
00:49:40,110 --> 00:49:41,930
the cryptographic checks.

969
00:49:42,430 --> 00:49:47,570
So it's actually, if you restored in RAM
and you validated all the checksums, but

970
00:49:47,570 --> 00:49:51,370
we do it in a stream way so you don't
have to actually hold the memory for the

971
00:49:51,750 --> 00:49:52,100
snapshot.

972
00:49:52,345 --> 00:49:53,115
For the whole, yeah.

973
00:49:53,600 --> 00:49:53,840
yeah.

974
00:49:54,340 --> 00:49:54,730
Okay.

975
00:49:55,230 --> 00:49:55,440
Yeah.

976
00:49:55,440 --> 00:49:56,910
Cause it has to de, de dupe.

977
00:49:56,910 --> 00:49:59,420
Yeah.

978
00:49:59,490 --> 00:50:00,140
has to read the data

979
00:50:00,200 --> 00:50:02,230
we have a couple questions I don't
even know if this is a thing.

980
00:50:02,230 --> 00:50:04,900
Is there a plugin for
Cyber attack detection.

981
00:50:04,900 --> 00:50:08,060
And I asked, is that, are you talking
about like ransom, like detecting

982
00:50:08,060 --> 00:50:09,980
ransomware from encrypting everything?

983
00:50:10,340 --> 00:50:11,820
yes, is there something like that?

984
00:50:12,240 --> 00:50:13,050
is that a thing?

985
00:50:13,110 --> 00:50:15,550
And what is, what do you
think about ransomware?

986
00:50:15,550 --> 00:50:17,730
Like, how do you do anything
for ransomware or do you just

987
00:50:17,820 --> 00:50:23,020
We do some, no, no, we do something, but,
the, like the position, the posture that

988
00:50:23,020 --> 00:50:25,260
people should have is the data is tossed.

989
00:50:25,320 --> 00:50:27,860
You have to have a copy elsewhere
and you have to have a copy that's,

990
00:50:27,910 --> 00:50:29,810
not, reachable by ransomware.

991
00:50:29,860 --> 00:50:33,425
Well, you have to have data that's
offsite and not on the network.

992
00:50:33,425 --> 00:50:36,785
And that's the only way that you're
sure that, well, sure, relatively

993
00:50:36,785 --> 00:50:38,195
sure that your SMR is not going to

994
00:50:38,195 --> 00:50:38,565
affect you.

995
00:50:39,355 --> 00:50:39,815
other provider?

996
00:50:40,515 --> 00:50:44,345
And then we have, okay, once we have
tackled this and we have said to

997
00:50:44,435 --> 00:50:48,475
people, don't trust anything else than
this solution, then there's all the

998
00:50:48,475 --> 00:50:50,195
solutions that are like, best effort.

999
00:50:50,910 --> 00:50:54,960
like for example, we have, we have,
entropy compute, compute, the entropy of

1000
00:50:55,010 --> 00:51:00,030
files and directories and we store this
as part of the metadata of each snapshot.

1001
00:51:00,440 --> 00:51:05,602
So you could actually use a diff,
like a paper diff, way to, To compare

1002
00:51:05,612 --> 00:51:09,212
if the entropy drastically changed
between two snapshots, for example,

1003
00:51:09,432 --> 00:51:15,242
this directory that had that low entropy
before has a very high entropy now.

1004
00:51:15,722 --> 00:51:18,592
The thing is the stores that
are pushing the They're Right.

1005
00:51:18,592 --> 00:51:18,882
on this.

1006
00:51:18,892 --> 00:51:21,902
So, so you're not, ever editing
something in the source.

1007
00:51:21,912 --> 00:51:26,942
So they can be, actually you can have warm
and forced, at a, at your provider level.

1008
00:51:27,122 --> 00:51:31,252
If you have entropy checking,
plus, the offsite copies,

1009
00:51:31,252 --> 00:51:33,392
offline copies, you kind of have.

1010
00:51:33,882 --> 00:51:38,502
Like a fairly good, situation because
things that would not completely trash or

1011
00:51:38,502 --> 00:51:42,202
store, you can still manage to say, Oh, I
had the machine that has the ransomware.

1012
00:51:42,212 --> 00:51:45,492
It's pushed back up with ransomware,
but the others, the snapshots

1013
00:51:45,492 --> 00:51:48,462
are not affected and it can
actually, remove the broken one.

1014
00:51:48,552 --> 00:51:49,642
Because they're immutable.

1015
00:51:50,142 --> 00:51:53,082
Then if that did not work, then
you can go back to, oh, I have a

1016
00:51:53,142 --> 00:51:54,912
offline copy, I have a offsite copy.

1017
00:51:55,262 --> 00:51:57,072
You, so you have to manage this for you.

1018
00:51:57,242 --> 00:52:00,832
You just can't trust a software
solution to take over somewhere.

1019
00:52:01,332 --> 00:52:03,692
Yeah, I like the entropy idea though.

1020
00:52:03,952 --> 00:52:07,382
Like, you're basically talking about if
the change rate on this particular backup

1021
00:52:07,382 --> 00:52:11,322
is normally 10 percent a day, having
something that notifies you when it's,

1022
00:52:11,512 --> 00:52:14,792
you know, double that, 20 percent change
this today or whatever, and some sort of

1023
00:52:15,017 --> 00:52:15,297
Yeah.

1024
00:52:15,482 --> 00:52:15,932
that your

1025
00:52:16,432 --> 00:52:19,712
And you will probably have an alert
on the size also, because of course

1026
00:52:19,712 --> 00:52:23,932
if you know everything is encrypted,
usually, you know, you can do with Plakar

1027
00:52:24,982 --> 00:52:28,172
something like 10, 000 cycles without
increasing the size of the storage.

1028
00:52:29,497 --> 00:52:33,857
You know, you can, yeah, increase the
frequency because we are just storing,

1029
00:52:33,907 --> 00:52:39,847
you know, a few metadata, only the
changes between two snapshots, so,

1030
00:52:39,887 --> 00:52:45,197
and so you can virtually make your
backup from, you know, every day to

1031
00:52:45,247 --> 00:52:49,681
every hour, every minute, depending
on the size of what You, are making.

1032
00:52:49,681 --> 00:52:54,551
if you have a ransomware, you will
have an alert on the size because it

1033
00:52:54,971 --> 00:52:58,421
will double at some point the size of
your storage and it's, something that,

1034
00:52:58,421 --> 00:52:59,561
should never happen, so.

1035
00:53:00,061 --> 00:53:00,521
Yeah.

1036
00:53:01,011 --> 00:53:05,671
you have that, and you have, you have
the idea that we, as I said, like very

1037
00:53:05,881 --> 00:53:09,561
early in the interview, we have built
some kind of database in some sense.

1038
00:53:09,571 --> 00:53:11,241
So we have multiple indexes.

1039
00:53:11,291 --> 00:53:16,051
we can look up images or videos because we
also index MIME types and stuff like that.

1040
00:53:16,421 --> 00:53:19,191
And the MIME types, they should be
aligned, in some way to the entropy

1041
00:53:19,191 --> 00:53:21,041
of the data, if you have a text plan

1042
00:53:21,051 --> 00:53:24,091
file and it has a high entropy,
you're going to raise alerts.

1043
00:53:24,141 --> 00:53:25,351
That's not, that's not great.

1044
00:53:25,381 --> 00:53:29,651
So you have many, many, these are the
few that come to my head right now,

1045
00:53:29,651 --> 00:53:33,201
but there are many, other ways, other
heuristics that you can use to actually

1046
00:53:33,201 --> 00:53:35,731
detect some kind of a fishy scenario that

1047
00:53:35,801 --> 00:53:39,711
that would gradually, take place because
you're, if it's already, if the ransomware

1048
00:53:39,711 --> 00:53:43,051
is already there, you should know,
because you are asked to give money.

1049
00:53:43,291 --> 00:53:46,131
But if you're in the middle
of the attack and you have.

1050
00:53:46,616 --> 00:53:50,956
a backup that's happening in this app
that have half the data corrupted,

1051
00:53:50,976 --> 00:53:52,026
half the assets are corrupted.

1052
00:53:52,026 --> 00:53:54,376
You're going to detect it through
entropy, metrics like this.

1053
00:53:55,391 --> 00:53:58,161
I feel like if I had to make something
myself, it would end up being

1054
00:53:58,161 --> 00:54:00,661
something that was so stupidly simple.

1055
00:54:00,671 --> 00:54:05,191
Like I'd create a monitoring solution
that watches a plain text file.

1056
00:54:05,201 --> 00:54:07,021
That's like, don't encrypt me.

1057
00:54:07,251 --> 00:54:10,731
txt or something that I put on every
single file share, every single server.

1058
00:54:11,041 --> 00:54:13,241
And if any single one
of them ever changes.

1059
00:54:13,741 --> 00:54:16,901
I get an alert, like I have some sort
of agent that somehow detects all

1060
00:54:16,901 --> 00:54:18,731
of them and it's the first level.

1061
00:54:18,781 --> 00:54:20,381
it doesn't even wait
for backups to happen.

1062
00:54:20,381 --> 00:54:21,981
It's just like, Oh,
this file just changed.

1063
00:54:22,031 --> 00:54:25,061
cause the way I've seen these things
roll out, these ransomwares is it

1064
00:54:25,061 --> 00:54:26,531
starts small and then just spirals.

1065
00:54:26,541 --> 00:54:29,921
So there are early indicators in
the early hours, because If you've

1066
00:54:29,921 --> 00:54:32,591
got terabytes of file storage, like
that doesn't all happen at once.

1067
00:54:32,621 --> 00:54:34,831
And it doesn't, and not everybody
has permissions to everything.

1068
00:54:34,831 --> 00:54:36,791
So it typically starts in little places.

1069
00:54:36,791 --> 00:54:38,911
So I'd probably like seed all
these little files everywhere.

1070
00:54:39,251 --> 00:54:43,451
besides the smartness of the
discussion, There's only the offline

1071
00:54:43,461 --> 00:54:44,441
backup that's going to save you.

1072
00:54:44,541 --> 00:54:44,771
There's

1073
00:54:44,771 --> 00:54:48,961
no other way that you be, it's,
it takes only one miss, to,

1074
00:54:50,991 --> 00:54:55,501
the ransomware, if you misdetect and
you let it happen, it's already done.

1075
00:54:55,581 --> 00:54:59,121
you don't have the luxury to,
to try and see if it works.

1076
00:54:59,121 --> 00:54:59,281
Yeah.

1077
00:54:59,701 --> 00:55:01,351
so you have to have
your own offline backup.

1078
00:55:01,361 --> 00:55:02,091
That's the only solution.

1079
00:55:02,271 --> 00:55:05,491
And then the, everything else is
nice to use that you could have.

1080
00:55:05,736 --> 00:55:10,096
But it should not be a blocker to
have the most annoying part, which is

1081
00:55:10,106 --> 00:55:14,346
the offline backend, which is the one
that takes the most effort to produce,

1082
00:55:14,726 --> 00:55:15,106
Yeah.

1083
00:55:15,606 --> 00:55:17,376
Well, I like those read only S3 buckets.

1084
00:55:17,426 --> 00:55:20,556
Those are something that I like to use
for, ensuring that files can't be deleted.

1085
00:55:20,556 --> 00:55:21,636
My backups can't be changed.

1086
00:55:21,896 --> 00:55:28,956
you know, did you hear about the Unisuper
incident where Google, they just deleted.

1087
00:55:29,456 --> 00:55:35,076
All the complete az, and region of
UniSuper, and they lost everything

1088
00:55:35,076 --> 00:55:38,826
because they, you know, basically they
dropped the billing accounts of the

1089
00:55:38,826 --> 00:55:41,106
client and it cascaded everywhere.

1090
00:55:41,106 --> 00:55:44,256
So, yeah, I would not rely on S3.

1091
00:55:44,826 --> 00:55:45,066
It's,

1092
00:55:45,116 --> 00:55:45,746
Oh, for sure.

1093
00:55:46,146 --> 00:55:50,066
I just mean, unlike a normal file
server or any, any drive storage or

1094
00:55:50,066 --> 00:55:53,136
anything like I can more easily ensure.

1095
00:55:53,636 --> 00:55:57,476
Things that are written to,
buckets don't get changed later.

1096
00:55:57,486 --> 00:56:02,306
Whereas like everything on a file
server is, you know, up for debate

1097
00:56:02,326 --> 00:56:04,016
on whether, what can access it.

1098
00:56:04,196 --> 00:56:05,376
but yeah, good advice.

1099
00:56:05,426 --> 00:56:06,206
one other question.

1100
00:56:06,206 --> 00:56:10,356
Gartner recently introduced the cloud
native infrastructure recovery category

1101
00:56:10,646 --> 00:56:14,236
in their latest hype cycle for backup
and data protection technologies.

1102
00:56:14,616 --> 00:56:17,386
Where would you position Plakar in.

1103
00:56:17,886 --> 00:56:20,646
Cares, I guess that's the
acronym for Cloud Native

1104
00:56:20,666 --> 00:56:23,816
Infrastructure Recovery something.

1105
00:56:24,316 --> 00:56:28,876
Let's say that we announced, last week or
this week that we are joining, you know, a

1106
00:56:28,876 --> 00:56:33,726
sponsor, Linux Foundation and CloudNative,
Direct Initiative, I don't know, yeah,

1107
00:56:33,736 --> 00:56:35,406
so basically we are joining those two

1108
00:56:35,501 --> 00:56:36,401
Sandbox maybe?

1109
00:56:36,431 --> 00:56:37,481
Are you going to go for Sandbox?

1110
00:56:37,531 --> 00:56:40,541
when you donate, are you donating
or are you just becoming a member?

1111
00:56:40,686 --> 00:56:45,296
we are, donating basically to, yeah,
be part, of the foundation, and because

1112
00:56:45,306 --> 00:56:49,261
why, it's the first step, you know,
to, who to put there and to understand

1113
00:56:49,271 --> 00:56:53,461
how you know this ecosystem is working
right now because we have to admit that

1114
00:56:53,471 --> 00:56:57,621
Jill is coming from the BSD world and
I don't have so much experience on this

1115
00:56:57,621 --> 00:57:01,751
one so it's we have to you know figure
out how we can be integrated and it was

1116
00:57:01,751 --> 00:57:06,421
the first step but yeah we are working
currently on the support of Kubernetes.

1117
00:57:06,866 --> 00:57:10,646
we hope, for Cloud Native Paris,
the 3rd of February that we have

1118
00:57:10,696 --> 00:57:12,556
something that will be, usable.

1119
00:57:12,606 --> 00:57:16,576
and, yeah, we really think that
at some point, a layer is missing

1120
00:57:16,626 --> 00:57:20,796
in the, in Cloud Native about,
you know, residency and backup.

1121
00:57:21,296 --> 00:57:25,986
and that, you know, with Plakar, we will
try to contribute, you know, try to bring,

1122
00:57:26,036 --> 00:57:31,976
up This, layer and be sure that, whatever,
the data that you have to back up, you

1123
00:57:32,066 --> 00:57:34,046
will be able to relay on this layer.

1124
00:57:34,206 --> 00:57:37,386
My previous job, you know, I was
managing quite large team with a

1125
00:57:37,396 --> 00:57:39,266
big e commerce company in Europe.

1126
00:57:40,106 --> 00:57:44,006
And I was fighting every quarter
to be sure that all the team

1127
00:57:44,016 --> 00:57:45,266
made their backup at some point.

1128
00:57:45,766 --> 00:57:51,066
But the things I never achieved is to be
sure that all the team has 3 to 1, Eclipse

1129
00:57:51,066 --> 00:57:53,746
3 to 1, you know, encrypted everything.

1130
00:57:53,976 --> 00:57:58,116
I think what is game changing with
Plakar right now is that we decoupled,

1131
00:57:58,556 --> 00:58:03,256
the storage, from, the technology
to store basically your backup.

1132
00:58:03,396 --> 00:58:05,306
and you can store your backup anywhere.

1133
00:58:05,991 --> 00:58:08,921
without trusting your provider.

1134
00:58:09,301 --> 00:58:10,471
So what does it change?

1135
00:58:10,521 --> 00:58:15,801
you could imagine, and it's what we
are releasing right now, a protocol

1136
00:58:15,961 --> 00:58:20,691
where you can push your backup to a
provider, and the provider is managing

1137
00:58:20,701 --> 00:58:26,161
the resilience of your data without any
kind of knowledge of your encryption

1138
00:58:26,161 --> 00:58:31,721
key, while we are maintaining low
network cost and low storage cost.

1139
00:58:32,221 --> 00:58:35,821
And I think it's, you know, the kind
of layer that is missing right now.

1140
00:58:36,181 --> 00:58:41,531
Be able to backup, all your
objects, basically, whatever is it.

1141
00:58:41,971 --> 00:58:45,981
just pushing it to a third party that
could be, you know, your own company.

1142
00:58:45,981 --> 00:58:49,071
It could be a team in your company
that is managing, the backup.

1143
00:58:49,371 --> 00:58:52,901
But today the issue is that pushing
all your data to one team in your

1144
00:58:52,931 --> 00:58:55,151
company with the encryption key, etc.,

1145
00:58:55,151 --> 00:59:01,341
to optimize the storage, it's a big
bottleneck in terms of security.

1146
00:59:01,691 --> 00:59:06,271
So, yeah, what we are enabling with
this new protocol is being able to,

1147
00:59:06,531 --> 00:59:10,201
you know, asking to every team, make
your backup, push this backup to a

1148
00:59:10,201 --> 00:59:14,146
third party, internal third party
or external third party, And this

1149
00:59:14,216 --> 00:59:18,146
third party will manage resilience
without any knowledge about your data.

1150
00:59:18,636 --> 00:59:22,336
So being able to make two copies
in two different cloud providers,

1151
00:59:22,376 --> 00:59:26,316
for example, one offline, etc.
And doing it on a clean way.

1152
00:59:26,556 --> 00:59:28,756
and I think it's, the kind
of contributions that we

1153
00:59:28,756 --> 00:59:30,406
can bring to the ecosystem.

1154
00:59:30,406 --> 00:59:33,776
But, yeah, of course, we want
to, do something to solve

1155
00:59:33,806 --> 00:59:35,056
all this resilience issue.

1156
00:59:35,556 --> 00:59:39,536
the way you're describing that, I can't
help but think that an OCI registry

1157
00:59:39,536 --> 00:59:44,126
would be a good option for that because
it's content addressable, it's SHA

1158
00:59:44,126 --> 00:59:49,536
hash guaranteed, unique identifiers,
it's read only, so you can be ensured

1159
00:59:49,536 --> 00:59:53,706
that there is integrity, it's got all
the metadata to it, so, I'm going to

1160
00:59:53,706 --> 00:59:56,256
put my vote in for that, but it sounds
like you're building something custom.

1161
00:59:56,276 --> 00:59:59,506
So, so I was going to ask as we end this
up, cause we're running a little long.

1162
00:59:59,556 --> 01:00:00,456
what was next?

1163
01:00:00,466 --> 01:00:04,546
It sounds like what's next is
Kubernetes initial Kubernetes support.

1164
01:00:04,546 --> 01:00:08,136
When you say Kubernetes support, are you
talking about running it on Kubernetes?

1165
01:00:08,146 --> 01:00:12,126
Or are you talking about backing up
like Kubernetes volumes or is it both?

1166
01:00:12,626 --> 01:00:15,646
we had a discussion about what was
the proper way to integrate into

1167
01:00:15,786 --> 01:00:18,806
Kubernetes because, it's one of our
developers that's working on it.

1168
01:00:18,806 --> 01:00:21,156
It was like, should I do
one integration that works?

1169
01:00:21,656 --> 01:00:22,376
That covers everything.

1170
01:00:22,636 --> 01:00:26,456
And I said, no, we have to decouple,
control plan and the data plan.

1171
01:00:26,466 --> 01:00:29,566
You have to be sure that, I want
to be able to back up all the

1172
01:00:29,566 --> 01:00:30,736
YAMLs from my configuration.

1173
01:00:30,736 --> 01:00:34,306
And I want to be able to selectively
back up, some of my volumes.

1174
01:00:34,306 --> 01:00:37,606
I don't want to have, no option, but
to back up everything or nothing.

1175
01:00:37,606 --> 01:00:42,506
So, so these are either two, two
separate integrations or one integrations

1176
01:00:42,506 --> 01:00:43,746
operating in two different modes.

1177
01:00:44,186 --> 01:00:47,676
but the idea is to tackle all of
them and, we were looking into his,

1178
01:00:47,696 --> 01:00:49,346
looking into, Valero integration.

1179
01:00:49,516 --> 01:00:50,546
and

1180
01:00:53,106 --> 01:00:58,476
it's tempting to go your own way always,
let's do our own integration, but there's

1181
01:00:58,496 --> 01:01:03,546
also a pragmatic way, which is, no, if
we can adapt to, to be run by Valero.

1182
01:01:03,686 --> 01:01:04,546
through Valero.

1183
01:01:04,986 --> 01:01:07,226
Then you get all the possibilities.

1184
01:01:07,226 --> 01:01:11,896
You can get, first our simple integration
to back up the kube configuration, which

1185
01:01:11,896 --> 01:01:16,896
can also, be used through Valero to, to
fit into the existing, setup of people.

1186
01:01:16,896 --> 01:01:18,966
They can just swap between
different solution.

1187
01:01:18,966 --> 01:01:22,346
They can test us while retaining their
old solution for whatever they're using.

1188
01:01:22,656 --> 01:01:26,906
Valero, then we can have a third
way of doing it, which is our own,

1189
01:01:26,906 --> 01:01:28,566
but that would come last basically.

1190
01:01:29,066 --> 01:01:29,996
But Yeah.

1191
01:01:29,996 --> 01:01:34,016
just to say we're planning on not
doing, being just a session that runs

1192
01:01:34,026 --> 01:01:37,706
within kube, but more as a session
that also manages to backup your kube.

1193
01:01:38,206 --> 01:01:38,486
Yeah.

1194
01:01:38,486 --> 01:01:41,646
That's one of the challenges I've been
seeing in the industry is like, You've got

1195
01:01:41,646 --> 01:01:46,556
like the traditional backup vendors that,
that have the plugins or integrations

1196
01:01:46,556 --> 01:01:47,546
or whatever they want to call it.

1197
01:01:47,566 --> 01:01:51,586
and, you know, you're paying lots
of money and they make you pay

1198
01:01:51,586 --> 01:01:54,746
for certain things like maybe the
Oracle integration is costing extra

1199
01:01:54,746 --> 01:01:56,986
and you know, that kind of thing.

1200
01:01:57,286 --> 01:01:58,466
And they're all closed source.

1201
01:01:58,846 --> 01:02:01,826
And then you have these open
source things like Valero and.

1202
01:02:02,326 --> 01:02:04,476
But the challenge with it
is it's just Kubernetes.

1203
01:02:04,516 --> 01:02:06,876
it's great at Kubernetes,
but it's just Kubernetes.

1204
01:02:06,876 --> 01:02:10,176
And typically, I don't really work with
any teams that are only Kubernetes.

1205
01:02:10,176 --> 01:02:12,996
I mean, even if they're Kubernetes
first and they're container first,

1206
01:02:13,496 --> 01:02:14,746
they're going to have other things.

1207
01:02:15,016 --> 01:02:18,976
And so then they have to have a completely
different set of tools for that stuff.

1208
01:02:19,396 --> 01:02:22,036
And those, then these two
things don't meet, right?

1209
01:02:22,366 --> 01:02:25,726
So, so Valera's backing up to whatever
storage you want to put on the plug in

1210
01:02:25,726 --> 01:02:28,496
on the back end, and then this other
system is completely separate, and

1211
01:02:28,496 --> 01:02:32,646
you've got, and, not that we can ever,
I mean, you know, at this point it

1212
01:02:32,646 --> 01:02:36,536
feels like all my clients have multiple
CIs, multiple backups, multiple clouds,

1213
01:02:36,536 --> 01:02:37,846
like there is no just one thing.

1214
01:02:38,066 --> 01:02:41,546
They've, they're doing everything
multiple times, multiple types

1215
01:02:41,546 --> 01:02:45,216
of databases, multiple different
database providers, and, So the

1216
01:02:45,216 --> 01:02:48,366
challenge I always feel like isn't to
get to one universal backup system.

1217
01:02:48,366 --> 01:02:50,816
It's to get as just as few as
possible so that you can maintain

1218
01:02:51,021 --> 01:02:52,661
yeah, that, that, that's the goal.

1219
01:02:53,081 --> 01:02:56,971
But if you were like, okay, imagine you
have 10, 10 separate tools, because that's

1220
01:02:56,971 --> 01:02:59,191
what I saw at some previous companies.

1221
01:02:59,521 --> 01:03:03,221
They have, like many teams, none of
them have, came up to a consensus

1222
01:03:03,221 --> 01:03:04,601
about what were the proper solution.

1223
01:03:04,601 --> 01:03:06,631
Each one came up with its
own, you end up with 10.

1224
01:03:07,131 --> 01:03:11,441
Even if we just reduced to three,
that's a net win over having to manage

1225
01:03:11,471 --> 01:03:13,182
10, 10 different solutions.

1226
01:03:13,182 --> 01:03:17,775
And in our sense, You can do something
very stupid, like stupid, stupidly, easy.

1227
01:03:17,805 --> 01:03:21,225
I mean, you can say, oh, I want an
integration that actually backs up to

1228
01:03:21,225 --> 01:03:22,845
another system, another backup system.

1229
01:03:23,235 --> 01:03:26,215
So you end up having everything
falling into Plakar through the

1230
01:03:26,235 --> 01:03:29,825
system of integrations, being able to
ingest data from whatever solution.

1231
01:03:30,245 --> 01:03:33,805
So there's also this, I'm saying it's
a possibility through the integration

1232
01:03:33,805 --> 01:03:37,565
system, but that means that you have
also a way to progressively de plug.

1233
01:03:37,850 --> 01:03:43,740
Unplug, older solution as you manage
to have integration written, but

1234
01:03:43,860 --> 01:03:48,500
still, be able to have everything
from day one into Plakar.

1235
01:03:49,360 --> 01:03:50,840
you know, you want to
back up some solution.

1236
01:03:50,840 --> 01:03:52,920
We don't have that integration
from that, but we have the

1237
01:03:52,920 --> 01:03:54,420
integration for your backup system.

1238
01:03:54,690 --> 01:03:55,010
Well, you

1239
01:03:55,100 --> 01:03:56,470
can back up through the other tool.

1240
01:03:56,720 --> 01:03:58,380
We back up the result of your backup.

1241
01:03:58,690 --> 01:04:01,930
Progressively as we have the integrations
to, to manage your tool natively,

1242
01:04:02,210 --> 01:04:03,780
you get, some tools out of the way.

1243
01:04:04,280 --> 01:04:04,400
So

1244
01:04:04,400 --> 01:04:06,560
the idea is to allow people to do that.

1245
01:04:06,670 --> 01:04:10,410
obviously we're not enough
people to write the hundreds of

1246
01:04:10,450 --> 01:04:11,590
integration that we would need,

1247
01:04:11,930 --> 01:04:16,550
but having simple SDKs, providing
good examples and starting to do the

1248
01:04:16,550 --> 01:04:21,280
most, like the most popular ones,
will lead us there ultimately, that's

1249
01:04:21,660 --> 01:04:22,260
the idea.

1250
01:04:22,735 --> 01:04:22,965
yeah.

1251
01:04:22,965 --> 01:04:27,065
And that's how tools like CyberDuck,
CyberDrive, like that whole, you know,

1252
01:04:27,185 --> 01:04:31,675
Project ecosystem, you know, like dozens
of different storage integrations.

1253
01:04:31,685 --> 01:04:34,585
I like using those tools because
they've got GUIs, they're user friendly.

1254
01:04:34,585 --> 01:04:37,455
They're really great for like personal
backups, personal file management.

1255
01:04:37,915 --> 01:04:42,155
and that tool, the magic of that tool
is that it works with like everything.

1256
01:04:42,215 --> 01:04:45,385
it, like every cloud storage
scenario you could think of,

1257
01:04:45,385 --> 01:04:46,875
it's got a plugin for that.

1258
01:04:47,125 --> 01:04:50,855
And so I feel like the integration or the
plugin ecosystem is is a lot of, in a lot

1259
01:04:50,855 --> 01:04:55,035
of ways, the magic of what makes a backup
product or a backup project, and really

1260
01:04:55,035 --> 01:05:00,165
interesting is the ways, all the different
things that I can backup up just in case.

1261
01:05:00,165 --> 01:05:02,445
I didn't see a GI repo option for GitHub.

1262
01:05:02,525 --> 01:05:04,715
that's maybe more of a, maybe a,
maybe we, instead of doing it with,

1263
01:05:05,185 --> 01:05:07,945
I guess if you do it with Git, then
you could do any of the GI providers.

1264
01:05:08,445 --> 01:05:12,485
But it might be better to actually do
it through the GitHub API and do it.

1265
01:05:12,485 --> 01:05:15,135
it depends what you want, because
on GitHub, what you would want

1266
01:05:15,135 --> 01:05:17,085
is probably not just the code.

1267
01:05:17,455 --> 01:05:19,265
It would be all the issues and all the,

1268
01:05:19,380 --> 01:05:19,800
right.

1269
01:05:20,225 --> 01:05:20,775
that's where

1270
01:05:20,945 --> 01:05:21,895
like two levels, right?

1271
01:05:21,895 --> 01:05:24,005
Yeah, it's like I need the
code, but I also could really

1272
01:05:24,005 --> 01:05:25,175
use all this other stuff.

1273
01:05:25,175 --> 01:05:25,495
Yeah.

1274
01:05:25,995 --> 01:05:26,655
well, this is great.

1275
01:05:26,655 --> 01:05:29,135
We could talk forever and I
really appreciate your time.

1276
01:05:29,155 --> 01:05:30,985
You've both been very
generous with your time.

1277
01:05:31,285 --> 01:05:34,855
we cover lots of topics in this
hour, and I'm, I am excited to start

1278
01:05:34,855 --> 01:05:36,085
playing with it and get started.

1279
01:05:36,305 --> 01:05:38,905
I am excited to hear about, what
you're going to do, what you're going

1280
01:05:38,905 --> 01:05:40,585
to announce on the Kubernetes side.

1281
01:05:40,885 --> 01:05:41,715
That's where I live.

1282
01:05:41,715 --> 01:05:44,795
So the Docker and Kubernetes stuff, I'm
going to subscribe to any issues that

1283
01:05:44,795 --> 01:05:48,455
have those words in them, so that I
can keep track of what the status is.

1284
01:05:48,765 --> 01:05:49,905
how do people find you?

1285
01:05:49,905 --> 01:05:51,205
So We've got the website,

1286
01:05:51,575 --> 01:05:51,665
are

1287
01:05:51,665 --> 01:05:52,465
on discord.

1288
01:05:55,260 --> 01:05:57,060
So and then you got the GitHub repo.

1289
01:05:57,340 --> 01:05:59,570
you got socials, looks like
you have a discord server.

1290
01:05:59,980 --> 01:06:01,580
So everybody that wants to get involved.

1291
01:06:02,720 --> 01:06:03,770
We work on Discord.

1292
01:06:03,800 --> 01:06:05,450
we are, as I said, all remote.

1293
01:06:06,140 --> 01:06:09,290
We're all working remotely and
we work transparently on discord.

1294
01:06:09,290 --> 01:06:11,210
So you can just come to our discord.

1295
01:06:11,490 --> 01:06:13,440
you can actually attend
all of our meetings.

1296
01:06:13,650 --> 01:06:17,610
You will be muted, , but you can actually,
look into any discussion, technical

1297
01:06:17,610 --> 01:06:19,490
discussion that happen, in the open.

1298
01:06:19,540 --> 01:06:20,200
Except the daily,

1299
01:06:20,310 --> 01:06:20,920
you can come.

1300
01:06:20,920 --> 01:06:22,700
and talk with us during the daily

1301
01:06:22,700 --> 01:06:22,970
yeah.

1302
01:06:23,181 --> 01:06:23,501
all right.

1303
01:06:23,501 --> 01:06:24,281
So we know what's next.

1304
01:06:24,281 --> 01:06:25,551
We know you're going to be at KubeCon.

1305
01:06:25,661 --> 01:06:27,671
People can follow you individually.

1306
01:06:27,771 --> 01:06:30,571
I guess you guys are on
socials, on LinkedIn.

1307
01:06:30,901 --> 01:06:34,591
I think in the YouTube description,
all the links are below for how to

1308
01:06:34,591 --> 01:06:35,881
follow these two fine gentlemen.

1309
01:06:36,181 --> 01:06:37,431
Well, thank you for, for having us.

1310
01:06:37,481 --> 01:06:38,401
This was, pretty great.

1311
01:06:39,473 --> 01:06:40,023
very much.

1312
01:06:40,068 --> 01:06:40,208
you.

1313
01:06:40,708 --> 01:06:40,968
All right.

1314
01:06:40,968 --> 01:06:42,198
Well, thank you both for being here.

1315
01:06:42,288 --> 01:06:44,658
And we will see you next time
here on, DevOps and Docker talk.

1316
01:06:45,158 --> 01:06:45,578
Ciao everybody!

1317
01:06:46,078 --> 01:06:46,388
Cheers.