1
00:00:04,540 --> 00:00:06,830
Bret: You're listening to DevOps and Docker Talk.

2
00:00:07,040 --> 00:00:08,540
I'm your host, Bret Fisher.

3
00:00:09,080 --> 00:00:12,621
These are edited audio only versions of my YouTube live show.

4
00:00:12,651 --> 00:00:14,741
Every Thursday at bret.live.

5
00:00:15,242 --> 00:00:18,302
This podcast is sponsored by my Patreon supporters.

6
00:00:18,572 --> 00:00:24,392
I'd like to think the now over 100 paid supporters that make this show such a pleasure to do.

7
00:00:24,632 --> 00:00:29,762
You can get more info and follow me for free at patreon.com/bretfisher.

8
00:00:30,559 --> 00:00:31,019
Okay.

9
00:00:31,179 --> 00:00:39,009
I'm pulling another episode out of the archives from 2020 when I was
taking a, not so short break from launching new podcast episodes.

10
00:00:39,429 --> 00:00:45,549
My guest this time is Nolan Brubaker from VMware, and we talk about the Velero open source project.

11
00:00:45,874 --> 00:00:51,911
For backing up migrating and restoring all of your Kubernetes resources and volumes.

12
00:00:52,054 --> 00:01:00,424
Bret: Now, usually  in my audio only podcast, I'll edit out the demos on the YouTube
live show simply because they don't make a lot of sense in an audio only format.

13
00:01:00,844 --> 00:01:05,824
But I listened to a lot of this one again, and it made a lot of sense and I'm leaving it in.

14
00:01:05,824 --> 00:01:11,854
So somewhere between the 20 and 30 minute mark, we get into a demo, but it's largely CLI base.

15
00:01:11,854 --> 00:01:14,764
And it's largely a discussion around how the tool is used.

16
00:01:14,974 --> 00:01:16,264
Some of the settings you might use.

17
00:01:16,534 --> 00:01:22,204
So I didn't feel like we were losing much of the knowledge transfer in just having audio only.

18
00:01:22,204 --> 00:01:30,304
So I'm keeping that in this podcast, if you're interested in some of the features of the
product, what's in the future of it and all that, we go through that pretty well in the demo.

19
00:01:31,114 --> 00:01:31,654
Check it out.

20
00:01:31,684 --> 00:01:33,214
I think it's a valuable project.

21
00:01:33,484 --> 00:01:41,499
That we all need to deal with when we're running Kubernetes
in production and we have to worry about  Now on with the show

22
00:01:41,652 --> 00:01:48,202
Bret (2): my guest today, which we've been,  talking about this for
months, , planning this with the team at VMware, , Nolan from VMware.

23
00:01:48,252 --> 00:01:49,002
, welcome to the show.

24
00:01:49,572 --> 00:01:50,802
Nolan Brubaker: Hi, thanks for having me.

25
00:01:51,342 --> 00:01:52,976
Bret Fisher: And what is your Twitter handle?

26
00:01:52,976 --> 00:01:54,986
I just realized what does that,

27
00:01:55,076 --> 00:01:55,696
Nolan Brubaker: what does that mean?

28
00:01:56,206 --> 00:02:00,686
That's palendae something that came up within gosh, now middle school.

29
00:02:01,676 --> 00:02:08,426
It's made up, I think I got it from a dragon lands, novel, took nice character name and changed it.

30
00:02:09,566 --> 00:02:13,676
Don't really talk about work stuff there, but when it follow me, you can,

31
00:02:13,743 --> 00:02:19,460
Be prepared not to see a whole lot of Docker or Kubernetes
but yeah, you're certainly welcome to follow me.

32
00:02:20,060 --> 00:02:20,390
Bret Fisher: Yeah.

33
00:02:20,390 --> 00:02:23,090
So I always appreciate having fellow gamers on the show.

34
00:02:23,090 --> 00:02:25,910
I just finished doom eternal and.

35
00:02:26,310 --> 00:02:28,950
Stayed up way too late, the last month, killing that game.

36
00:02:28,980 --> 00:02:32,550
And it, I almost threw my controller a few times cause

37
00:02:34,130 --> 00:02:36,350
Nolan Brubaker: yeah, I haven't gotten into doing maternal.

38
00:02:36,380 --> 00:02:41,540
Um, I, what I've lately been playing has been dragon quest 11 on the switch.

39
00:02:42,180 --> 00:02:47,550
But that's gonna take me while the way that's the way
I've been playing it since it's like over a hundred hours.

40
00:02:47,600 --> 00:02:50,780
So yeah, it's going to take me a long time to get through this

41
00:02:51,140 --> 00:02:51,830
Bret Fisher: that's commitment.

42
00:02:51,830 --> 00:02:52,190
Yeah.

43
00:02:52,400 --> 00:02:52,650
Yeah.

44
00:02:52,700 --> 00:02:55,220
That's more than a few charges on your, uh, on your name.

45
00:02:55,800 --> 00:02:56,570
Yeah.

46
00:02:57,270 --> 00:03:00,820
Nolan Brubaker: I'm prepared for that to be like a year long thing, if not more so.

47
00:03:00,880 --> 00:03:01,830
Bret Fisher: Right, right.

48
00:03:02,435 --> 00:03:02,885
Well, cool.

49
00:03:02,915 --> 00:03:05,795
Oh, by the way, for those asking, I are wondering, I didn't know.

50
00:03:05,895 --> 00:03:10,955
I had to ask if that was Metroid related in the background
and indeed that poster is from super Metroid, right?

51
00:03:11,015 --> 00:03:11,785
That's a, yup.

52
00:03:11,995 --> 00:03:12,145
Nolan Brubaker: Yup.

53
00:03:12,205 --> 00:03:17,665
That's a super Metroid thing off of a, that was a fan art poster that I got for Christmas one year.

54
00:03:18,235 --> 00:03:18,745
Bret Fisher: Yeah.

55
00:03:18,925 --> 00:03:19,375
Nice.

56
00:03:19,615 --> 00:03:22,615
So we don't normally talk about games in this show, but I just thought I had to mention it.

57
00:03:22,615 --> 00:03:24,085
Cause it's, we're staying, we're going to be staring at it.

58
00:03:24,085 --> 00:03:24,745
The whole show.

59
00:03:24,795 --> 00:03:28,485
I, my gamer handle is Sonic bum and it didn't make much sense on the internet.

60
00:03:28,485 --> 00:03:38,415
So I had to ended up changing it to my real name because I realized
nobody would know nobody would find me mostly because I was into Sonic the
hedgehog in the nineties and I Sonic boom was taken and Sonic was taken.

61
00:03:38,415 --> 00:03:41,985
So I made up Sonic bum, but that is old news.

62
00:03:42,049 --> 00:03:45,941
You can still find some remnants of that stuff around if someone was trying to hunt me down.

63
00:03:46,451 --> 00:03:46,751
All right.

64
00:03:46,751 --> 00:03:52,751
So let's talk about Kubernetes and backups because that's why
you're here, but also it's a topic that doesn't get a lot of talk.

65
00:03:53,721 --> 00:03:59,321
And but first let's talk about like, how did you get
started at VM-ware and like w w what's your background?

66
00:04:00,061 --> 00:04:00,571
Nolan Brubaker: Yeah.

67
00:04:00,560 --> 00:04:04,234
So I came to VMware via Heptio.

68
00:04:04,232 --> 00:04:08,718
I started at Heptio working on Velero, which was half the arc.

69
00:04:08,781 --> 00:04:10,881
Got renamed as part of the acquisition.

70
00:04:11,931 --> 00:04:29,706
The project started one day at Heptio when Joe beta was trying to work with some clusters
and realized he had, um, EBS volumes that he's like, Hey, I need, what if wouldn't
it be great if something snapshotted this snapshot of these volumes automatically?

71
00:04:29,692 --> 00:04:41,993
So I came from before that I was at Rackspace working on a project called OpenStack
Ansible that deployed the OpenStack control plane with Ansible in non Docker containers.

72
00:04:41,993 --> 00:04:43,013
But LXC containers.

73
00:04:43,848 --> 00:04:53,978
So it's was taking the control plane and instead of like deploying
a ton of hardware, it was condensing that into containers.

74
00:04:54,758 --> 00:05:04,148
So yeah, I went from that to working at Heptio and trying
to make this whole Kubernetes thing work out and, uh,

75
00:05:05,118 --> 00:05:05,838
Bret Fisher: that

76
00:05:06,258 --> 00:05:06,738
Nolan Brubaker: bet.

77
00:05:09,218 --> 00:05:10,148
Bret Fisher: Yeah, really cool.

78
00:05:10,418 --> 00:05:12,248
So Valera is now the new product name.

79
00:05:12,248 --> 00:05:14,982
In fact, let me just bring it up so people can see the site.

80
00:05:15,642 --> 00:05:15,912
Yeah.

81
00:05:15,979 --> 00:05:16,202
Yeah.

82
00:05:17,192 --> 00:05:22,334
And Velero that IO, and they can find out all the good stuff about this here.

83
00:05:23,144 --> 00:05:31,599
And, you know, typically, like if we're going to talk about
backups, Typically backups are a boring conversation, right?

84
00:05:31,939 --> 00:05:32,289
Yeah.

85
00:05:32,349 --> 00:05:32,739
Yeah.

86
00:05:32,889 --> 00:05:34,759
And a lot of us think we have it handled.

87
00:05:34,819 --> 00:05:46,142
I mean, you know, it's funny that it's not like when I usually work on like
container projects and stuff like that, backups are almost there almost
push to the point of they just want to just assume that they're happening.

88
00:05:46,292 --> 00:05:49,142
Like everyone just assumes that somebody did it and that they're automatic.

89
00:05:49,142 --> 00:05:50,220
And they're always right.

90
00:05:50,220 --> 00:05:51,720
And then we've got everything we need to restore.

91
00:05:51,750 --> 00:05:51,870
Oh.

92
00:05:51,880 --> 00:05:54,900
And that we've tested restores, and that we know how to do Dr.

93
00:05:55,110 --> 00:05:57,600
And that we've validated our Dr on a regular basis.

94
00:05:58,530 --> 00:06:00,660
And almost never is any of that.

95
00:06:00,660 --> 00:06:01,020
True.

96
00:06:01,000 --> 00:06:02,424
And how do we fix this?

97
00:06:02,485 --> 00:06:02,845
Nolan Brubaker: Yeah.

98
00:06:02,875 --> 00:06:11,215
Well, in a lot of, I think in like a cloud native world, a
lot of people who assume that your vendors handle it, right?

99
00:06:11,245 --> 00:06:13,225
Like, so you're running on a public cloud.

100
00:06:13,765 --> 00:06:14,815
It doesn't matter who it is.

101
00:06:14,884 --> 00:06:17,944
And you assume, well, they've handled all that hard stuff.

102
00:06:17,974 --> 00:06:18,214
Right.

103
00:06:18,244 --> 00:06:20,074
They've you assume that your.

104
00:06:20,744 --> 00:06:27,473
Your volumes are they're durable, they're resistant to
availability zones falling down and things like that.

105
00:06:27,863 --> 00:06:28,103
Right.

106
00:06:28,133 --> 00:06:34,523
And maybe they are, maybe they aren't, maybe east goes away and you want to get back more quickly.

107
00:06:34,602 --> 00:06:42,607
Then those engineers can or maybe they're, there's
some applications that are not made to be cloud native.

108
00:06:42,667 --> 00:06:47,497
They're not designed yet to be living in that world.

109
00:06:47,947 --> 00:06:56,869
So it's all well, and good if you have a stateless application, but
if you've got data you really should be owning that backup strategy.

110
00:06:57,529 --> 00:07:04,339
And rather than just assuming your cloud provider has taken
care of all of it, you need to make sure you're protected.

111
00:07:04,364 --> 00:07:12,458
So Velero allows you to not, it does two things, it backs up, the Kubernetes made a date metadata.

112
00:07:12,578 --> 00:07:17,720
So like the Yammel or Jason manifests and it also grabs your persistent volumes.

113
00:07:17,750 --> 00:07:24,563
And that's probably the bigger thing because you could get the
Kubernetes manifests through, get ops, if you're following that strategy.

114
00:07:24,543 --> 00:07:31,235
So if you want to grab your Kubernetes application data, Velero provides a way to get that.

115
00:07:31,835 --> 00:07:32,045
Yeah.

116
00:07:32,104 --> 00:07:36,454
So there's a couple of different ways we can do that.

117
00:07:36,454 --> 00:07:45,574
We've got support for doing snapshots through your persistent volume
provider or using a, an application called rustic to get file system level.

118
00:07:45,632 --> 00:07:49,020
And that makes it platform neutral, right?

119
00:07:49,020 --> 00:08:04,456
So you can move from one provider to another But yeah, like you said,
it's, we assume all this stuff's being done, but really someone on in
your organization should probably be owning that for your application.

120
00:08:05,106 --> 00:08:05,466
Bret Fisher: Right.

121
00:08:05,886 --> 00:08:28,155
Uh, it, uh, you know, I think a lot of people would, who don't really know cloud and Kubernetes
well would probably think that, you know, backups are either built in, or it's a checkbox
type of thing because we would, at this point, it's like, we've w you know, Kubernetes and
Docker and the container world has we've felt like we've got this method for easily deploying.

122
00:08:28,155 --> 00:08:40,351
So we're now we're rapidly deploying, and we're able even able to allow people to
create new apps and deploy them without much effort in terms of the, getting the
operations team to be up, man, this big manual effort to say, oh, you got a new app.

123
00:08:40,351 --> 00:08:42,451
Well, it's going to take us a month to get that into production.

124
00:08:42,547 --> 00:08:50,017
But having that experience where maybe the Kubernetes Jamel has something
in it, or that basically says, yep, this is the part you need to back up.

125
00:08:50,047 --> 00:08:51,007
And that's all I need to know.

126
00:08:51,007 --> 00:09:07,740
And is this, are we to the point with this, project where in my , as a
developer, I can specify what needs to be backed up or somehow labeled or
something so that it can be an automatic effort where the ops team doesn't then
have to figure out how to back up that thing after it's already on my server.

127
00:09:08,010 --> 00:09:08,580
Is it that kind of way?

128
00:09:08,647 --> 00:09:14,219
Nolan Brubaker: We've talked about that so that you
could package in your application to say, yeah, yeah.

129
00:09:14,219 --> 00:09:17,549
That the application could tell Velero, this is what you need.

130
00:09:17,696 --> 00:09:19,518
Veleros not to that level yet.

131
00:09:19,518 --> 00:09:25,848
The way Valera works is that Velero has a command line and it runs as deployment in your server.

132
00:09:26,203 --> 00:09:27,493
Or in your, I'm sorry, in your cluster.

133
00:09:27,853 --> 00:09:28,063
Yeah.

134
00:09:28,173 --> 00:09:38,633
And the Velero operates by creating its own backup CRS and then
goes and runs a controller operator to make the backup happen.

135
00:09:38,933 --> 00:09:44,003
And that backup CR says, okay, what do you want to back up?

136
00:09:44,033 --> 00:09:46,733
Is it whole everything in a namespace?

137
00:09:47,153 --> 00:09:49,163
Is it everything matching this label?

138
00:09:49,673 --> 00:09:52,403
Is it everything except for these things?

139
00:09:52,953 --> 00:09:59,058
So you can do excludes, so everything not matching this
label every everything, but this re this type of resource.

140
00:09:59,418 --> 00:10:04,878
So maybe you don't want to include a cluster roles as an example.

141
00:10:04,995 --> 00:10:08,265
There's, we're, we're engaging with groups upstream.

142
00:10:08,295 --> 00:10:10,863
There's a relatively new, I think it was formed.

143
00:10:11,978 --> 00:10:22,608
Trying to remember exact timelines, but there's a new working group, upstream, the data
protection working group, which is a collaboration between SIG apps and six storage.

144
00:10:22,788 --> 00:10:31,170
That's trying to figure out ways to standardize how
applications can communicate, what would need to be backed up.

145
00:10:31,320 --> 00:10:36,210
So what's Amanda, like what's in this application, is it a deployment?

146
00:10:36,322 --> 00:10:37,552
Is it a stateful set?

147
00:10:37,582 --> 00:10:44,382
What, what are the components of this application and
also what are the building blocks for protecting it?

148
00:10:44,441 --> 00:10:47,858
Is it does it have volumes that need to be snapshotted?

149
00:10:47,858 --> 00:10:50,678
Does it have other external resources that we need to grab?

150
00:10:51,158 --> 00:10:59,768
So we're not only working on it within a Velero context, we're trying
to work with the upstream community so that there's standards here.

151
00:10:59,798 --> 00:11:01,088
So it's not just us.

152
00:11:01,149 --> 00:11:04,172
But yeah, in terms of, of having something that.

153
00:11:04,517 --> 00:11:09,017
The application developers define we're not there yet, but we want to get there.

154
00:11:10,447 --> 00:11:10,837
Bret Fisher: Right.

155
00:11:10,904 --> 00:11:15,009
Cause I, you know, not having any experience with this product, so I'm coming into it as a new user.

156
00:11:15,028 --> 00:11:25,976
I'm thinking, okay, if I'm trying to move my tooling to a more automated fashion, I mean, I'm,
you know, this is my year of get ops, seeing all the things, at least in my personal projects.

157
00:11:26,336 --> 00:11:30,085
And so I imagine okay, I've got a new app, I'm going to host it in my cluster.

158
00:11:30,085 --> 00:11:34,885
I'm going to be putting that those manifest or those helm charts or something in my repo.

159
00:11:35,305 --> 00:11:37,405
And how can I just add that?

160
00:11:37,405 --> 00:11:48,895
Like you said, a custom resource that specifies, Hey, these things in
my deployment are also need to be backed up or maybe an annotation or a
label or something that you're talking about in the existing resources.

161
00:11:49,224 --> 00:11:51,774
If you're a dedicated backup person, you want that command line, right.

162
00:11:51,774 --> 00:11:52,524
That makes total sense.

163
00:11:52,554 --> 00:11:57,574
You want to be able to control the cluster as a whole and want
to be able to see all the backups and how all those resources.

164
00:11:57,574 --> 00:11:59,086
And so, that, that's awesome.

165
00:11:59,133 --> 00:12:02,043
I'm always trying to shift that responsibility to the team.

166
00:12:02,223 --> 00:12:17,001
That's owning the app and of course, with varying levels of success but it,
that, that did seem like a thing that would be super helpful as to say, Hey,
if you all just want to take care of it and don't even involve us we'll just
put these things in your gamble, your helm chart, or whatever you're creating.

167
00:12:17,001 --> 00:12:21,441
And then just know that we're going to be back in that up and, you know, it'll work for whatever.

168
00:12:21,581 --> 00:12:22,811
So the w the way I would

169
00:12:22,811 --> 00:12:36,222
Nolan Brubaker: probably pitch that to a an application team is there's we have what are called
schedules, and that's basically it takes the same set of includes and exclude fields that a.

170
00:12:37,027 --> 00:12:41,437
Backup does, and it runs, it automatically runs it automatically on some schedule.

171
00:12:42,307 --> 00:12:44,947
So say you've got a helm chart for your application.

172
00:12:45,157 --> 00:12:49,567
You could include the schedule as part of that.

173
00:12:49,657 --> 00:12:59,077
So it's not quite to the level that, so we've got like open issues for, I
think we've called them backup templates that applications could define.

174
00:12:59,089 --> 00:13:13,076
But if you include a Velero schedule in your AML, you could apply that and
it could say include the whatever label the application developers define.

175
00:13:13,346 --> 00:13:19,766
So they could say, okay, this is the label we use.

176
00:13:19,766 --> 00:13:24,767
This is the schedule we We use, these are the hooks we define to backup the database.

177
00:13:24,827 --> 00:13:29,537
So we don't have, we don't have anything in Velero to directly dump databases.

178
00:13:29,567 --> 00:13:39,061
We instead to find hooks that, say, run this command on my container
and get the stuff dumped out of the database into a persistent volume.

179
00:13:39,123 --> 00:13:51,745
So that's probably how I would approach it in the current state is give the
application, developers use a schedule or at least find a backup CR in their helm chart.

180
00:13:52,495 --> 00:14:01,125
And from there that can be invoked on a schedule and then the cluster
administrator would be responsible for getting Velero installed.

181
00:14:01,785 --> 00:14:02,145
Bret Fisher: Right.

182
00:14:03,165 --> 00:14:05,461
Can you send things to a cloud storage?

183
00:14:05,461 --> 00:14:11,776
Like I want to put it this S3 bucket or in this, whatever digital
are those like dry, would you call those drivers or plugins or,

184
00:14:12,216 --> 00:14:12,396
Nolan Brubaker: yep.

185
00:14:12,426 --> 00:14:13,236
Those are plug-ins.

186
00:14:13,283 --> 00:14:20,403
Velero has a plugin model for there's three, three, four there's four main types of plugins.

187
00:14:20,433 --> 00:14:23,065
There's two of them are kind of grouped together.

188
00:14:23,145 --> 00:14:32,655
Their item action plugins, which happened on backup and restore, so they
can modify Kubernetes manifests as they come in and out of the cluster.

189
00:14:32,655 --> 00:14:37,493
So on backup, you can modify Kubernetes, excuse me, cumin, Kubernetes manifests.

190
00:14:37,913 --> 00:14:38,153
Okay.

191
00:14:38,423 --> 00:14:43,160
So we use that to like walk from a pod to a PVC, to a PV.

192
00:14:43,790 --> 00:14:48,590
So like when you're backing up a pod, we assume if it's got any PVCs, you want the PDs.

193
00:14:48,655 --> 00:14:53,065
And then we've got restore item actions that manipulate stuff on restore.

194
00:14:53,106 --> 00:14:58,086
Kind of doing the reverse as an example, you go from a PV to rebuild back to the pod.

195
00:14:59,226 --> 00:15:06,471
Then we have object storage, plugins, which hook up to S3 GCP, object storage, Azure blob storage.

196
00:15:06,460 --> 00:15:08,769
And there's third party plugins for that too.

197
00:15:08,773 --> 00:15:16,037
I don't those three cover the main object storage
cases, and then we have volumes snapshot or plugins.

198
00:15:16,067 --> 00:15:20,477
So those do EBS snapshotting GCP volume.

199
00:15:20,487 --> 00:15:24,000
Snapshotting we have a vSphere snapshot or plugin now.

200
00:15:23,983 --> 00:15:24,332
There's been

201
00:15:24,332 --> 00:15:28,044
Bret Fisher: community at a, at a disk level, a volume mountain level.

202
00:15:28,044 --> 00:15:28,314
Yeah.

203
00:15:28,324 --> 00:15:30,864
Yeah, no, it's not so much an application level.

204
00:15:30,924 --> 00:15:34,704
Nolan Brubaker: Yeah, no, those are not at a, at an application level currently.

205
00:15:35,064 --> 00:15:35,394
Bret Fisher: Yeah.

206
00:15:35,944 --> 00:15:36,244
Yeah.

207
00:15:36,694 --> 00:15:38,494
Is it using CSI in the background?

208
00:15:38,484 --> 00:15:45,410
Is that how you're are you able to do these plugins
for story to have to be separate from a CSI plugin?

209
00:15:45,410 --> 00:15:46,760
Is this somehow related or

210
00:15:47,240 --> 00:15:48,220
Nolan Brubaker: right now they're separate.

211
00:15:48,250 --> 00:15:50,534
They Velero actually predates CSI.

212
00:15:50,774 --> 00:15:51,014
Okay.

213
00:15:51,104 --> 00:15:55,587
Our one point, yeah, so our 1.4 release, which we're hoping to get out in a couple of weeks.

214
00:15:55,650 --> 00:16:01,500
Actually that's our main feature is getting a CSI integration at a beta level.

215
00:16:01,553 --> 00:16:14,628
So that's something myself and sheesh is, are working on she showed another
engineer on our team and we're working with the CSI community to make sure
like we're working with it, working with the Kubernetes CSI integration.

216
00:16:15,228 --> 00:16:24,384
So yeah, our hope is eventually we deprecate the Velero plugins for snapshotting.

217
00:16:25,219 --> 00:16:27,649
But that's going to be a long-term goal.

218
00:16:27,679 --> 00:16:28,699
I'm sure.

219
00:16:28,729 --> 00:16:29,959
For the foreseeable future, we're going to

220
00:16:29,959 --> 00:16:31,539
Bret Fisher: have both, right?

221
00:16:32,169 --> 00:16:32,409
Yeah.

222
00:16:32,438 --> 00:16:34,878
That's the same thing that the Kubernetes community is doing, right?

223
00:16:34,878 --> 00:16:42,228
Like obviously the built-in plugins are going to be there for awhile
and like the CSI's not necessarily feature complete in comparison.

224
00:16:42,228 --> 00:16:42,468
Yeah.

225
00:16:42,468 --> 00:16:44,774
So that I can see how that's a, that's a multi-year process.

226
00:16:44,800 --> 00:16:45,010
Right.

227
00:16:45,240 --> 00:16:46,420
You don't want to leave anyone behind.

228
00:16:46,412 --> 00:16:49,862
You don't want to, you don't want to leave someone behind
just because they're still using the old backup that works.

229
00:16:49,862 --> 00:16:50,072
Yeah.

230
00:16:50,122 --> 00:16:50,462
Nolan Brubaker: Right.

231
00:16:50,642 --> 00:16:54,170
And  not every time CSI driver has the snapshotting capability yet, right.

232
00:16:54,200 --> 00:17:01,334
And the snapshotting even though CSI itself is GA the snapshotting API is still beta.

233
00:17:01,394 --> 00:17:08,684
So we're Velero itself is probably going to trail behind on the GA status.

234
00:17:08,742 --> 00:17:21,013
Just so we can see as more drivers become available, we're testing it
out with the drivers that are there, but It's requiring some tweaks to
Velero and we want to make sure we don't break existing Velero users.

235
00:17:21,073 --> 00:17:28,953
And we also want to make sure,  we're being good
community participants and helping inform the design.

236
00:17:30,443 --> 00:17:31,343
Bret Fisher: Yeah, that makes sense.

237
00:17:31,698 --> 00:17:40,425
Certainly a, if people aren't used to defaulting to CSI for their regular apps,
then not using that by the way, CSI, sorry, peoples container storage interface.

238
00:17:40,425 --> 00:17:40,635
Yeah.

239
00:17:40,663 --> 00:17:45,942
It's the standard that criminals is now using for new
plugins to use different storage other than just local disk.

240
00:17:45,972 --> 00:17:52,932
So if you're using a cloud storage, you can either choose the
built-in ones, but now the new way to do it lately is the CSI plugins.

241
00:17:52,932 --> 00:17:53,202
Right?

242
00:17:53,196 --> 00:17:58,566
And the idea for all of this, right, is that this
every storage vendor can make their own CSI plugin.

243
00:17:58,636 --> 00:18:05,076
This is even on the roadmap now for Docker swarm for it to
adopt a CSI as sort of a standard mechanism for storage.

244
00:18:05,076 --> 00:18:20,466
So that we maybe have a, the dream someday is that the entire container industry can
rely on a single volume plugin for each type of volume, whether that's a cloud storage or
your VMware storage, or your S your net app ice Guzzy storage or whatever it is, right.

245
00:18:20,486 --> 00:18:22,436
You can, you can just rely on that one driver.

246
00:18:22,436 --> 00:18:25,316
And it provides all the mechanisms for all these different types of tools.

247
00:18:25,616 --> 00:18:34,836
And you don't have to learn us that, so the, each one of your products doesn't
have a different way of connecting to storage, which, I kinda thought we
solved like 15 years ago with ice because he, I felt like that was the dream.

248
00:18:35,196 --> 00:18:38,106
And then the cloud happened and then containers happen.

249
00:18:38,106 --> 00:18:41,466
And, you know, we're, now we're back to trying to figure it out loud again, so, yup.

250
00:18:41,526 --> 00:18:41,706
Nolan Brubaker: Yup.

251
00:18:41,767 --> 00:18:43,096
And on the literal team.

252
00:18:43,486 --> 00:18:43,756
Yeah.

253
00:18:43,816 --> 00:18:44,026
Yeah.

254
00:18:44,026 --> 00:19:03,234
We saw that in we're like we saw the CSI stuff happening and we're like,
well, It doesn't make any sense to ask storage vendors to make Valera specific
plugins, especially like if there's competing backup solutions, it doesn't
make any sense to ask storage vendors to make a bunch of different ones.

255
00:19:03,954 --> 00:19:04,194
Right.

256
00:19:04,194 --> 00:19:04,794
So

257
00:19:05,204 --> 00:19:06,084
Bret Fisher: there always is right.

258
00:19:06,089 --> 00:19:06,989
There's a ton of ideas,

259
00:19:07,709 --> 00:19:08,159
Nolan Brubaker: right?

260
00:19:08,159 --> 00:19:16,064
So if there's this community standard, let's get involved there,
let's make sure it works for everybody and hook into that.

261
00:19:16,094 --> 00:19:16,874
Yeah, for sure.

262
00:19:16,884 --> 00:19:19,964
It makes absolute sense to make Velero compatible with that.

263
00:19:19,993 --> 00:19:25,123
But yeah, when Velero started CSI, the discussion for CSI hadn't really started.

264
00:19:25,453 --> 00:19:25,573
Yeah.

265
00:19:25,983 --> 00:19:31,529
Bret Fisher: And even Docker struggled with this because they, they
provided Docker plugins for storage, but they, this was pre CSI.

266
00:19:31,919 --> 00:19:37,349
And so now Docker is wanting to also consider the CSI as a mechanism so that we.

267
00:19:37,340 --> 00:19:39,590
Cause storage vendors don't want to do this.

268
00:19:39,590 --> 00:19:43,790
They don't want to make a Kubernetes plugin and a Docker plugin and a backup, every vendor plugin.

269
00:19:43,858 --> 00:19:48,268
And of course, trying to make them all one is way harder than we all probably think it is.

270
00:19:48,728 --> 00:19:51,208
So yeah, it'll be, it'll get there someday.

271
00:19:51,208 --> 00:19:51,808
Maybe.

272
00:19:51,813 --> 00:19:54,763
I have some friends that are very skeptical that this is ever going to come to fruition.

273
00:19:56,413 --> 00:19:57,303
They dream of storage.

274
00:19:57,873 --> 00:19:58,233
It's

275
00:19:58,263 --> 00:19:59,613
Nolan Brubaker: not easy.

276
00:19:59,613 --> 00:20:04,893
And I'm glad that from my perspective, I'm just calling storage.

277
00:20:04,929 --> 00:20:06,129
I'm not an implementer.

278
00:20:06,215 --> 00:20:07,565
The stateful stuff's hard.

279
00:20:07,805 --> 00:20:09,755
Stateful stuff is definitely hard.

280
00:20:10,115 --> 00:20:10,355
Yeah.

281
00:20:11,495 --> 00:20:18,395
Bret Fisher: And snapshotting, and like snapshotting is crazy voodoo
that sometimes I don't even understand really how it's happening.

282
00:20:18,444 --> 00:20:18,624
Yeah.

283
00:20:18,624 --> 00:20:24,174
Especially when the apps are aware of this, of the snapshot and they
actually write to disk before snapshotting like that stuff gets super nerdy.

284
00:20:24,414 --> 00:20:32,664
We could talk about that all day, but I do want to get to demos cause you sure you
prepared demos and we all love a good demo here on YouTube where we could watch.

285
00:20:32,725 --> 00:20:36,765
Let me know when you got your screen share and I will, yeah,

286
00:20:38,875 --> 00:20:40,155
Nolan Brubaker: I've got it shared.

287
00:20:40,155 --> 00:20:41,355
We'll switch over to Firefox.

288
00:20:41,355 --> 00:20:42,315
So you don't see yourself.

289
00:20:42,315 --> 00:20:43,875
So I am in my browser.

290
00:20:44,895 --> 00:20:45,165
Yep.

291
00:20:46,035 --> 00:20:46,785
All right.

292
00:20:46,815 --> 00:20:53,687
So I've just got a very simple WordPress and my SQL application running.

293
00:20:54,287 --> 00:20:56,627
So I just want to show that this is up and running.

294
00:20:56,657 --> 00:20:57,857
I've got some data in it.

295
00:20:57,837 --> 00:20:59,923
Hopefully this is big enough for people to read.

296
00:21:00,253 --> 00:21:02,983
Got a post, sorry, bumped in my microphone there.

297
00:21:03,036 --> 00:21:04,436
I can go into the post.

298
00:21:05,786 --> 00:21:08,636
I've got a comment.

299
00:21:11,016 --> 00:21:19,896
And just show that like I'm not restoring any other weird data, five comment.

300
00:21:19,946 --> 00:21:24,546
I'm not doing any, I'm going to take an actual backup.

301
00:21:24,537 --> 00:21:30,686
So got some small data and then I'm going to jump over here to my terminal.

302
00:21:31,496 --> 00:21:33,716
So I just want to show I've got,

303
00:21:33,792 --> 00:21:36,522
Got Velero running in.

304
00:21:37,212 --> 00:21:38,562
My name is bass,

305
00:21:49,632 --> 00:21:52,302
and right now I just got one replica.

306
00:21:53,382 --> 00:22:03,612
And because I'm not great at talking and typing, I'm just going to run this
demo script, which is gonna run commands and proceed when I hit buttons.

307
00:22:03,642 --> 00:22:20,709
I've got this WordPress namespace, it's just got one pod each for WordPress and my SQL and
they've got server services to expose them just one to go to the outside world for WordPress.

308
00:22:21,639 --> 00:22:28,179
And I've also got PVCs.

309
00:22:30,459 --> 00:22:32,169
So we've got one for my SQL.

310
00:22:32,799 --> 00:22:35,319
We've got one for WordPress.

311
00:22:35,349 --> 00:22:38,757
That's just for static assets, pictures, things like that.

312
00:22:38,997 --> 00:22:39,447
Uploads.

313
00:22:39,747 --> 00:22:39,987
Yeah.

314
00:22:40,167 --> 00:22:41,907
Uploads, CSS.

315
00:22:43,017 --> 00:22:45,117
And just Pru just approve.

316
00:22:45,387 --> 00:22:48,687
There are indeed persistent volumes.

317
00:22:48,747 --> 00:22:54,537
This is the, my SQL claim that matches to this one to

318
00:22:54,537 --> 00:22:54,807
Bret Fisher: that.

319
00:22:55,437 --> 00:22:55,707
All right.

320
00:22:55,707 --> 00:22:55,977
Okay.

321
00:22:57,657 --> 00:23:00,977
Nolan Brubaker: Now I'm going to show, I have, this is the latest version of Velero.

322
00:23:01,204 --> 00:23:07,024
Got one, three, two on the server and one, three, two running on my laptop.

323
00:23:07,914 --> 00:23:09,654
Bret Fisher: So before, so sorry, let me back up for a second.

324
00:23:09,654 --> 00:23:16,644
So before this, you deployed the custom resource
definitions, and then you deploy a controller, right?

325
00:23:16,641 --> 00:23:22,313
So there's a controller running in your cluster and this
works on any standard Kubernetes conformance cluster, right?

326
00:23:22,793 --> 00:23:23,063
Yup.

327
00:23:23,453 --> 00:23:23,783
Nolan Brubaker: Yup.

328
00:23:23,813 --> 00:23:27,143
So OpenShift TKG rancher.

329
00:23:27,473 --> 00:23:27,833
Yeah.

330
00:23:28,913 --> 00:23:30,533
Any Kubernetes conformant cluster.

331
00:23:30,683 --> 00:23:31,103
Yep.

332
00:23:32,073 --> 00:23:37,954
And and Something, I should also mention Velero will also work on managed Kubernetes clusters.

333
00:23:37,984 --> 00:23:40,714
So it backs up things through the Kubernetes API.

334
00:23:40,744 --> 00:23:42,989
It doesn't grab at the D directly.

335
00:23:43,062 --> 00:23:49,801
So if you're on GKE or EKS or anything like that, you don't get access to SED directly.

336
00:23:50,371 --> 00:23:52,911
So that's why we work through the API server.

337
00:23:53,301 --> 00:23:56,121
So it'll work even on, excuse me, managed.

338
00:23:56,701 --> 00:23:57,051
Okay.

339
00:23:58,011 --> 00:24:04,930
Yeah, that there's some there's some issues with that because things
could be changing in the API server while the backups running.

340
00:24:05,260 --> 00:24:10,960
And we've talked in the data protection working group
about maybe introducing some sort of freeze API.

341
00:24:11,064 --> 00:24:18,234
But that's probably down the road and requires upstream,
Kubernetes changes, but yeah, for now that's how I've Lira works.

342
00:24:19,434 --> 00:24:19,704
All right.

343
00:24:20,244 --> 00:24:23,598
And yeah, so I'm going to.

344
00:24:24,073 --> 00:24:29,263
Do this Velero command varies, like what we talked about.

345
00:24:30,463 --> 00:24:39,283
I'm naming it, WP dash demo, including this namespace,
WordPress, and I'm just going to wait for it to complete.

346
00:24:44,113 --> 00:24:44,833
Intro Music: All right.

347
00:24:46,903 --> 00:25:00,732
Nolan Brubaker: And the way that works is we just fire off a custom resource to the Kubernetes
API server and let our controller slash operator run against it and clear the screen.

348
00:25:00,732 --> 00:25:02,472
So it's not all at the bottom.

349
00:25:03,402 --> 00:25:07,944
And we're going to do a describe against it to see what all is there.

350
00:25:07,944 --> 00:25:10,194
So I'm gonna scroll up here.

351
00:25:10,174 --> 00:25:12,844
So the name is what we named it.

352
00:25:12,904 --> 00:25:13,744
WP demo.

353
00:25:13,826 --> 00:25:19,342
Velero puts all its backspaces or I'm sorry, backups in the Velero namespace.

354
00:25:19,322 --> 00:25:27,687
It does not store it in the same namespace as the application, because
what if you accidentally delete that namespace and you want to get it back?

355
00:25:27,761 --> 00:25:31,691
It's, we've had requests to change it and we're open to that.

356
00:25:31,691 --> 00:25:33,341
We just need to figure out that problem.

357
00:25:33,341 --> 00:25:36,461
Like if you accidentally delete that namespace, you need a way to get it back.

358
00:25:36,501 --> 00:25:40,387
So we're not completely married to that design.

359
00:25:40,387 --> 00:25:42,547
We just need to figure out a solution to that problem.

360
00:25:42,577 --> 00:25:42,967
Yeah.

361
00:25:43,447 --> 00:25:44,437
Bret Fisher: Parallel namespace.

362
00:25:44,437 --> 00:25:45,817
That's right.

363
00:25:46,337 --> 00:25:46,627
Name.

364
00:25:46,687 --> 00:25:46,957
Yeah.

365
00:25:47,027 --> 00:25:47,127
Yeah.

366
00:25:47,587 --> 00:25:47,887
Nolan Brubaker: Right.

367
00:25:47,887 --> 00:25:52,311
We duplicate it or something like that, but for now It either goes into the default.

368
00:25:52,611 --> 00:25:53,761
Namespace is Velero.

369
00:25:53,811 --> 00:25:56,421
You can deploy it to whatever namespace you want.

370
00:25:56,487 --> 00:26:03,417
We label everything with the knee, or we label every backup
with the storage location, which is the object storage.

371
00:26:03,747 --> 00:26:07,827
Uh, there's a representation of the object storage bucket called the store backup storage location.

372
00:26:08,367 --> 00:26:11,566
And this is just so we can easily fetch them from the API server.

373
00:26:11,866 --> 00:26:12,136
Right.

374
00:26:12,256 --> 00:26:17,926
In case it went to the default, there were no annotations on the backup and it was completed.

375
00:26:17,990 --> 00:26:23,780
We've got some information on what stuff was included or excluded.

376
00:26:24,650 --> 00:26:29,930
So the namespace was WordPress and we didn't, we did not exclude anything.

377
00:26:30,170 --> 00:26:42,350
This is useful if you want to back up the whole cluster, but say exclude
cube system, because usually there's a lot of stuff that's managed by the
cluster that you might want to exclude because it's specific to the cluster.

378
00:26:43,595 --> 00:26:53,072
Or that running cluster, I should say C here we didn't apply a label
selector to the backup, so we didn't grab stuff based on a label.

379
00:26:54,152 --> 00:26:56,492
Again, we stored it in the default location.

380
00:26:57,692 --> 00:27:01,082
We automatically snapshotted any persistent volumes

381
00:27:04,082 --> 00:27:04,802
by default.

382
00:27:05,192 --> 00:27:10,802
The time to live or duration of a backup is a month or a 720 hours.

383
00:27:13,562 --> 00:27:16,052
There were no hooks defined on the backup itself.

384
00:27:16,082 --> 00:27:27,889
You can define hooks on the backup or on, you can actually define a hook on a
pod or a deployment, or, you know, any, anywhere you can put a pod, a template.

385
00:27:28,459 --> 00:27:37,249
So you can find a hook on your application to say, dump my database,
whether that'd be Mongo or my SQL or whatever your application.

386
00:27:37,344 --> 00:27:38,688
Uses the backup format.

387
00:27:38,744 --> 00:27:41,794
This is the format that we store in object storage.

388
00:27:41,914 --> 00:27:45,634
We actually have some changes coming to this in 1.4 that are backwards compatible.

389
00:27:45,634 --> 00:27:56,427
So that it's going to be, I can talk about this more after the demo, if you'd
like but we're going to take some longer term visions for the backup format.

390
00:27:56,425 --> 00:28:04,025
We've got some information on starting completed and then just so
you don't have to download the whole backup to see what's in it.

391
00:28:04,325 --> 00:28:04,535
We've got it.

392
00:28:04,675 --> 00:28:05,515
Resource list.

393
00:28:07,105 --> 00:28:07,525
Bret Fisher: Cool.

394
00:28:08,335 --> 00:28:08,515
Yeah.

395
00:28:08,515 --> 00:28:13,886
That resource list is the money that's where you're confirming all the objects that you expected.

396
00:28:14,606 --> 00:28:14,876
Yup,

397
00:28:15,086 --> 00:28:15,356
Nolan Brubaker: yup.

398
00:28:15,416 --> 00:28:15,716
Yeah.

399
00:28:15,956 --> 00:28:18,266
And then finally, this is what we snapshotted.

400
00:28:18,296 --> 00:28:20,739
So we've got, these are what we're created.

401
00:28:20,848 --> 00:28:23,518
And this, these ideas will vary based on your provider.

402
00:28:23,518 --> 00:28:23,848
Cause.

403
00:28:24,533 --> 00:28:33,160
Different providers use different IDs and I'm currently working on updating
the values that will go here for the CSI plug-ins that we've written.

404
00:28:33,140 --> 00:28:39,706
So I've got a PR in progress to add this information for CSI snapshots.

405
00:28:42,226 --> 00:28:48,575
So, now we're going to simulate a disaster or an accident,
and we're just going to nuke the WordPress namespace.

406
00:28:49,595 --> 00:28:49,925
Yeah.

407
00:28:50,225 --> 00:28:52,595
Or, or before I do that, do you have any other questions?

408
00:28:52,626 --> 00:28:53,826
Bret Fisher: No, I don't think so.

409
00:28:53,856 --> 00:28:56,046
Yeah, we, so we're going to nuke it and then we're going to restore it.

410
00:28:56,046 --> 00:28:56,556
Is that what you're doing?

411
00:28:57,096 --> 00:28:57,456
Nolan Brubaker: Yep.

412
00:28:57,606 --> 00:28:57,856
Yep.

413
00:28:57,876 --> 00:28:57,956
Okay.

414
00:28:57,956 --> 00:29:02,166
I'm going to new kit and I'll also run to the website and show that no, really it's gone.

415
00:29:04,836 --> 00:29:08,046
So should be busted.

416
00:29:08,076 --> 00:29:08,376
Yep.

417
00:29:10,326 --> 00:29:10,676
Sorry.

418
00:29:10,746 --> 00:29:12,726
Should start getting four oh four soon.

419
00:29:19,116 --> 00:29:19,236
Intro Music: It's

420
00:29:19,236 --> 00:29:20,256
Nolan Brubaker: deleted

421
00:29:26,556 --> 00:29:27,246
nothing there.

422
00:29:29,726 --> 00:29:35,936
So just to show, I'm not pulling anything site's really
gone and I'm refreshing not getting anything back.

423
00:29:37,226 --> 00:29:42,897
And to confirm even further, nothing found there's nothing on the Kubernetes.

424
00:29:45,867 --> 00:29:46,197
Bret Fisher: All right.

425
00:29:47,097 --> 00:29:47,337
Nothing.

426
00:29:47,337 --> 00:29:48,387
It just leaves nothing.

427
00:29:49,497 --> 00:29:49,827
Nope,

428
00:29:50,157 --> 00:29:56,817
Nolan Brubaker: Nope, no persistent volumes on the
cluster anymore, either does only stuff in this cluster.

429
00:29:56,877 --> 00:29:58,917
It was the WordPress stuff.

430
00:29:59,037 --> 00:30:00,567
Con it's WordPress.

431
00:30:00,596 --> 00:30:06,276
Contour was my ingress controller which I think you talked to Steve loca and Velero.

432
00:30:09,271 --> 00:30:17,851
And it probably should have cleared everything here, but
we're going to create a restore named WP dash restore.

433
00:30:19,411 --> 00:30:21,451
And we're going to use from backup.

434
00:30:21,481 --> 00:30:25,831
We're going to use the WP dash demo as the source for this restore.

435
00:30:26,151 --> 00:30:26,361
Yeah.

436
00:30:31,101 --> 00:30:41,991
So again, we're what we're doing here is submitting a
request, a custom resource to the community's API server.

437
00:30:42,681 --> 00:30:46,281
The community's API server we'll get it.

438
00:30:46,281 --> 00:31:00,951
And the Velero restore controller or operator will grab that, start the restore, and we're gonna
see that it will create all the namespaces and everything, and then things will start to run.

439
00:31:03,021 --> 00:31:07,091
So take a look at the restore details we come up here.

440
00:31:08,436 --> 00:31:17,616
So of course, very similarly, the restorer store name and namespace, same as the backup.

441
00:31:17,646 --> 00:31:22,856
We've got the same name and the same namespace, no labels or annotations.

442
00:31:25,016 --> 00:31:29,546
And the namespaces here are a little different than the backup.

443
00:31:29,641 --> 00:31:38,011
We included all the namespaces that were in the backup because
we don't know ahead of time what namespaces are in there.

444
00:31:38,011 --> 00:31:43,795
So, we didn't include, we didn't do dash dash include namespaces although we could on a restore.

445
00:31:43,795 --> 00:31:49,375
So you could selectively grab an individual namespace
out of a restore or out of a backup, excuse me.

446
00:31:50,245 --> 00:31:52,975
So you could say back up my whole cluster.

447
00:31:53,750 --> 00:31:56,270
And then only grab one namespace out of it.

448
00:31:56,450 --> 00:32:00,170
So say I accidentally deleted this one.

449
00:32:00,170 --> 00:32:10,173
Namespace, you could go grab one namespace out of it, if you want it
to, or you could do a label selector out of the names out of the backup.

450
00:32:11,553 --> 00:32:24,166
And similarly we re included all resources except for nodes, because it doesn't make a
whole lot of sense to recreate nodes because that's all managed by the cluster itself.

451
00:32:24,256 --> 00:32:29,416
It doesn't make a whole lot of sense to create nodes without
actual hardware or virtual hardware standing behind it.

452
00:32:29,402 --> 00:32:32,517
Events are all short-lived, so we don't recreate those.

453
00:32:32,697 --> 00:32:34,497
They don't make a whole lot of sense to restore.

454
00:32:34,483 --> 00:32:43,094
And also we don't restore our own backups and restores because we found restoring these actually.

455
00:32:44,129 --> 00:32:45,179
Causes some recursion.

456
00:32:45,479 --> 00:32:48,662
So if we restore backups we kick off new backups.

457
00:32:48,645 --> 00:32:52,344
If we, if we restore restores, we kick off new restores.

458
00:32:52,824 --> 00:32:53,094
Yeah.

459
00:32:53,180 --> 00:32:54,350
So flair doesn't do that.

460
00:32:54,332 --> 00:33:03,679
And then, then rustic repositories are for, if we're doing file level
backups and Velero manages those outside of this backup restore cycle.

461
00:33:03,709 --> 00:33:04,858
So we don't restore those.

462
00:33:04,976 --> 00:33:17,186
We can also, if you'd like do namespace remapping on restore, so you can
say, take my WordPress namespace and rename it to something FUBAR, you know?

463
00:33:17,876 --> 00:33:18,086
W

464
00:33:18,143 --> 00:33:18,383
Bret Fisher: why would

465
00:33:18,383 --> 00:33:18,773
Nolan Brubaker: I do that?

466
00:33:20,303 --> 00:33:21,023
Yeah, you can do that.

467
00:33:21,023 --> 00:33:32,633
If you wanted to clone a namespace, say you wanted to take a production
namespace, or you're playing around, and you wanted to have a
pristine copy of this namespace, but you wanted to change something.

468
00:33:32,645 --> 00:33:41,015
And you weren't sure if it was going to work, you could
copy that namespace with PVS and everything and mess around.

469
00:33:41,001 --> 00:33:52,077
So we've had users request this and it's, I don't, I honestly don't have statistics on
how many people use this, but it's been a feature that people will have definitely used.

470
00:33:52,287 --> 00:33:53,367
So, yeah.

471
00:33:53,440 --> 00:33:58,390
Yeah, it's, it's something you can do in and take your PD data along with you.

472
00:33:58,389 --> 00:34:00,909
If you do a namespace remapping, you can copy the PVS.

473
00:34:01,359 --> 00:34:01,689
Bret Fisher: Yeah.

474
00:34:01,869 --> 00:34:03,039
I mean, that's something that is a.

475
00:34:03,729 --> 00:34:06,609
I think, especially in the storage room, that's actually pretty normal.

476
00:34:06,609 --> 00:34:16,269
Like you can, like I mentioned earlier, NetApp storage, like NetApp has an
ability for you to take a snapshot and then essentially put it somewhere
else so that someone can work on a read only copy or something like that.

477
00:34:16,269 --> 00:34:17,199
So, yeah, totally.

478
00:34:17,209 --> 00:34:28,959
That totally makes sense to me, especially if you think about the work that
it would be involved with something like a dev ops team in order to spin up a
new namespace, that's identical to the current one so that someone can see it.

479
00:34:28,965 --> 00:34:33,825
That's a lot of copying yam and stuff versus just
saying, Hey, let's just restore to a new namespace.

480
00:34:33,855 --> 00:34:34,045
Yeah.

481
00:34:34,325 --> 00:34:34,475
Yup.

482
00:34:35,285 --> 00:34:35,675
Nolan Brubaker: Yup.

483
00:34:35,825 --> 00:34:37,522
And there's similar work upstream.

484
00:34:37,584 --> 00:34:42,769
There's I'm not sure where the cap is, but there's like a PV clone functionality that's coming.

485
00:34:42,799 --> 00:34:48,954
That's not a full namespace copy, but there's similar ideas in other realms.

486
00:34:49,035 --> 00:34:59,395
Well, Not making any commitments on changing, but we're definitely going to
look at maybe using that functionality further on down the road versus our own.

487
00:34:59,425 --> 00:35:08,143
But yeah it's something that maybe, or maybe in a CIC D
set up, you might want to use this to validate something.

488
00:35:10,353 --> 00:35:10,623
Cool.

489
00:35:11,583 --> 00:35:11,913
All right.

490
00:35:11,913 --> 00:35:15,303
And then you can also use a flag too.

491
00:35:16,323 --> 00:35:18,873
Maybe you don't want to automatically restore PVS.

492
00:35:18,943 --> 00:35:21,253
You could set that to false and not restore PVS.

493
00:35:21,253 --> 00:35:24,227
You just want the the Kubernetes metadata for some reason.

494
00:35:26,207 --> 00:35:26,547
Okay.

495
00:35:26,707 --> 00:35:33,317
So after we've looked at that, take a look at the persistent volumes.

496
00:35:35,207 --> 00:35:36,557
So they're all back.

497
00:35:41,087 --> 00:35:44,687
You can look at the namespace and all of that stuff should be back.

498
00:35:46,047 --> 00:35:49,587
Got our service back, got our pods back.

499
00:35:49,587 --> 00:35:51,897
We've got a replica set or deployments.

500
00:35:52,957 --> 00:36:00,687
I go back to my website, got my post.

501
00:36:00,747 --> 00:36:07,617
It's got both comments, including the one I made and prove I can, it's still working.

502
00:36:16,997 --> 00:36:18,887
So everything is still running.

503
00:36:21,977 --> 00:36:22,337
All right.

504
00:36:22,487 --> 00:36:24,647
And, uh, that was the end of the demo.

505
00:36:24,691 --> 00:36:27,855
So yeah, the CRDs let's limit that.

506
00:36:31,765 --> 00:36:34,165
Just to show the Velero CRDs.

507
00:36:34,165 --> 00:36:37,525
This is what we've got in there that was installed.

508
00:36:37,525 --> 00:36:39,295
And there's a Velero install command.

509
00:36:39,325 --> 00:36:46,435
We've got two ways to install it with this Velero
install command, which is shipped with the Velero client.

510
00:36:46,435 --> 00:36:49,645
So it's all built in or with a Velero helm chart.

511
00:36:51,255 --> 00:36:51,705
Bret Fisher: Okay.

512
00:36:52,485 --> 00:36:54,585
Is there, sorry, go ahead.

513
00:36:57,195 --> 00:36:59,903
Nolan Brubaker: Was going to say either way we support both ways.

514
00:37:00,233 --> 00:37:00,473
Yeah.

515
00:37:00,532 --> 00:37:04,426
And the Velero install command we're currently revisiting it.

516
00:37:04,493 --> 00:37:07,064
It was meant to be like a quick start kind of tool.

517
00:37:07,064 --> 00:37:16,222
And then we, it's kind of grown into a little bit of a beast to
show you what I mean, if I do dash dash help, it's a very long list.

518
00:37:16,952 --> 00:37:18,812
Of options, right?

519
00:37:19,622 --> 00:37:24,831
So we're looking at ways to fix this and make it much more friendly.

520
00:37:24,884 --> 00:37:35,431
So the helm charts one way of doing it, and we're also looking at replacing Velero install
command with Velero config command, cause Velero install is like a one and done thing.

521
00:37:35,417 --> 00:37:38,477
Bret Fisher: Right now infrastructure as code friendly.

522
00:37:39,057 --> 00:37:42,324
Nolan Brubaker: Yeah, you can do it has a dash dash dry runs.

523
00:37:42,324 --> 00:37:47,598
You can dump out Yammel and get your get ops kind of thing from there.

524
00:37:47,628 --> 00:37:49,368
But yeah, it's not,

525
00:37:52,458 --> 00:37:53,118
it's not great.

526
00:37:53,104 --> 00:38:04,406
So we're looking at revisiting that and making it more, more useful for the infrastructure
as code and get ops and maybe doing layering with I'm already blanking on a customized yep.

527
00:38:04,796 --> 00:38:06,776
Doing customized and helm.

528
00:38:06,759 --> 00:38:11,117
But those right now, our options are Velero install and I'm a blinking helm.

529
00:38:11,167 --> 00:38:14,331
Uh, those are the two ways that we support for installing it at the moment.

530
00:38:14,391 --> 00:38:19,221
And of course doing the dash that dry run to dump out the ammo that it produces.

531
00:38:19,901 --> 00:38:29,319
Bret Fisher: Can you do that same dry run on a backup job so you can produce
Yammel that you could apply rather than doing the Villa backup command, you know?

532
00:38:29,889 --> 00:38:30,249
Yeah.

533
00:38:30,249 --> 00:38:30,549
Nolan Brubaker: Yeah.

534
00:38:30,624 --> 00:38:37,875
So, if I say include namespaces word press, and I do a dash dash dryer on

535
00:38:38,475 --> 00:38:38,715
Bret Fisher: yep.

536
00:38:43,180 --> 00:38:44,020
Nolan Brubaker: What did I get wrong?

537
00:38:44,470 --> 00:38:49,210
Oh, actually I think it's actually just dash emo, but

538
00:38:49,270 --> 00:38:52,720
Bret Fisher: I know the flag include namespaces yeah, it

539
00:38:52,720 --> 00:38:53,680
Nolan Brubaker: was backup crane.

540
00:38:53,980 --> 00:38:54,250
Yeah.

541
00:38:54,310 --> 00:38:54,610
There you go.

542
00:38:55,410 --> 00:38:58,017
And yeah, so it's just Dasha.

543
00:38:58,099 --> 00:39:01,322
Yeah the install command is kind of janky.

544
00:39:01,328 --> 00:39:12,589
I say that as the person who wrote it yeah, you can do the DASHO Yammel there's also the
Velero schedule command which my scared and that takes all the same commit, same arguments.

545
00:39:13,009 --> 00:39:13,069
Yeah.

546
00:39:13,129 --> 00:39:18,635
As a backup, including a Cron job specification.

547
00:39:18,741 --> 00:39:19,251
Let's see.

548
00:39:19,251 --> 00:39:29,751
So if I say namespaces WordPress and I'm going to forget where the Cron jobs space is,

549
00:39:31,121 --> 00:39:33,411
Bret Fisher: and then you have to schedule in there.

550
00:39:33,411 --> 00:39:33,711
Yeah.

551
00:39:34,071 --> 00:39:34,311
Yeah.

552
00:39:34,311 --> 00:39:34,371
I

553
00:39:34,371 --> 00:39:35,721
Nolan Brubaker: couldn't remember what the name was.

554
00:39:35,835 --> 00:39:38,625
And now let's do eight.

555
00:39:39,255 --> 00:39:39,435
Is it

556
00:39:39,435 --> 00:39:40,185
five?

557
00:39:42,945 --> 00:39:43,335
There we go.

558
00:39:44,265 --> 00:39:46,615
So a schedule would look like this.

559
00:39:48,175 --> 00:39:52,047
So it's a very it's got a backup template right there.

560
00:39:52,027 --> 00:39:56,944
And then when you do restore, you can do a dash dash from schedule.

561
00:39:58,174 --> 00:40:06,486
And if you do dash dash from schedule on a restore, it will grab
the latest backup created from that schedule and restore from that.

562
00:40:07,326 --> 00:40:07,746
Bret Fisher: Okay.

563
00:40:08,631 --> 00:40:10,317
And this schedule is yeah.

564
00:40:10,317 --> 00:40:12,373
It's one of your API resources.

565
00:40:12,403 --> 00:40:12,733
Okay.

566
00:40:13,063 --> 00:40:13,183
Nolan Brubaker: Yep.

567
00:40:13,533 --> 00:40:13,863
Yep.

568
00:40:13,923 --> 00:40:16,743
It's one of the, one of the custom resource types.

569
00:40:17,013 --> 00:40:17,343
Bret Fisher: Yeah.

570
00:40:17,373 --> 00:40:22,554
So it acts like a Cron job, but it's not actually using
the jobs or any of the current job, the default resources.

571
00:40:22,554 --> 00:40:23,574
It's a custom resource.

572
00:40:23,654 --> 00:40:23,884
Yeah.

573
00:40:24,054 --> 00:40:24,204
Yeah.

574
00:40:24,200 --> 00:40:25,130
Nolan Brubaker: It's a custom resource.

575
00:40:25,140 --> 00:40:26,841
We have our own custom operator for it.

576
00:40:26,835 --> 00:40:33,556
We've looked into making making it use the Cron job,
but we haven't prioritized that in our backlog yet.

577
00:40:33,536 --> 00:40:38,797
Bret Fisher: Well, it's probably nice to keep regular
application stuff separate from your actual backup stuff.

578
00:40:38,797 --> 00:40:40,327
So, you know, it has the same features.

579
00:40:40,327 --> 00:40:56,610
I don't see like why that's a, at least for me I would, I'd be fine with it being
its own, but I, we had a question actually from the chat is is there some form of
scheduled job that can be deployed to a Kate's cluster that was simulate the backup
and restore periodically to ensure that the backup and restore process is monitored?

580
00:40:57,565 --> 00:40:58,015
Yeah,

581
00:40:58,015 --> 00:41:07,195
Nolan Brubaker: so we include Prometheans metrics, but in
terms of restores, we don't have a schedule for restores.

582
00:41:07,705 --> 00:41:12,355
So right now that needs to be done separately.

583
00:41:12,355 --> 00:41:13,825
So you'd need to include your own.

584
00:41:13,859 --> 00:41:15,839
I can stop sharing at this point if you'd like.

585
00:41:15,929 --> 00:41:16,259
Sure.

586
00:41:16,359 --> 00:41:20,289
I don't really have any, anything more unless we need to get something specific.

587
00:41:21,449 --> 00:41:21,509
Bret Fisher: Yeah.

588
00:41:21,519 --> 00:41:27,192
But I mean, back that's a good question around you know, backup
monitoring, and then which are almost really two separate things.

589
00:41:27,192 --> 00:41:27,402
Right.

590
00:41:27,402 --> 00:41:28,482
Backup monitoring.

591
00:41:28,476 --> 00:41:30,186
And then Dr.

592
00:41:30,186 --> 00:41:34,146
And just recovery validation is always a challenge for every team.

593
00:41:34,146 --> 00:41:35,916
So let's talk about monitoring for a minute.

594
00:41:35,916 --> 00:41:37,686
So what goes up, what's going on there?

595
00:41:38,376 --> 00:41:38,946
So

596
00:41:38,976 --> 00:41:43,416
Nolan Brubaker: there's two, two things we advise we've got Prometheus's metrics exposed.

597
00:41:43,481 --> 00:41:49,451
We don't ship a whole lot of stuff to set up from atheists mostly because that's.

598
00:41:50,316 --> 00:41:51,546
That's kind of outside the scope.

599
00:41:51,552 --> 00:41:55,062
But we do ship out Promethease metrics.

600
00:41:55,092 --> 00:42:00,072
Like last time, the battery backup job failed last time a backup job finished.

601
00:42:00,074 --> 00:42:09,464
You can also query for the latest backup job, like watch the end
points and grab that information in terms of validating, restores.

602
00:42:10,184 --> 00:42:10,754
Yeah.

603
00:42:10,815 --> 00:42:13,315
That is something we don't provide.

604
00:42:13,345 --> 00:42:15,805
Like I said, we don't provide a schedule.

605
00:42:16,840 --> 00:42:19,270
That's equivalent to a backup.

606
00:42:19,540 --> 00:42:24,130
And we also don't want to just like overwrite.

607
00:42:24,460 --> 00:42:26,170
We don't want to just apply to your clusters.

608
00:42:26,200 --> 00:42:31,231
So you're probably gonna want to come up with some strategy that applies it to some test cluster.

609
00:42:31,211 --> 00:42:31,917
Bret Fisher: I was sitting it's again.

610
00:42:31,947 --> 00:42:39,207
I did, as you were talking, I was thinking that I was like, we, it almost needs
to be like a custom add in that basically pushes your backups to a separate place.

611
00:42:39,207 --> 00:42:47,907
That's has no way to affect production and then does a real one and then validates, and then you
have to have application validation to make sure that the app is actually restored correctly.

612
00:42:47,907 --> 00:42:48,297
So

613
00:42:49,197 --> 00:42:51,387
Nolan Brubaker: that's a separate product, right?

614
00:42:51,627 --> 00:42:52,017
Right.

615
00:42:52,038 --> 00:42:52,690
That's tough.

616
00:42:52,731 --> 00:43:02,181
The closest I can think of to get in a generic way is to do comparisons on drew Yammel.

617
00:43:02,841 --> 00:43:03,141
Bret Fisher: Yeah.

618
00:43:03,586 --> 00:43:10,433
Nolan Brubaker: And even then there's, there's, Kate's metadata,
like UIDs and creation, timestamps and things like that.

619
00:43:10,433 --> 00:43:12,263
That's just not valid in the comparison.

620
00:43:12,903 --> 00:43:13,073
Yeah.

621
00:43:14,063 --> 00:43:19,373
And so you have to re you have to rip out, like we
rip that out on home restore because it's not useful.

622
00:43:19,408 --> 00:43:20,998
So that, that is tough.

623
00:43:21,053 --> 00:43:29,210
There's some internal projects we have that are not ready to ship
out that have tried that, but they're kind of half baked and PLCs.

624
00:43:29,323 --> 00:43:39,883
And we're also talking about for our 1.5 release timeframe to get some
more end to end testing on our public CEI to start playing with this.

625
00:43:39,883 --> 00:43:48,685
And maybe that gets elevated into part of the project, but mostly
it will be to validate some like major bug regression testing.

626
00:43:48,715 --> 00:43:48,955
Bret Fisher: Yeah.

627
00:43:49,555 --> 00:43:49,945
Okay.

628
00:43:49,925 --> 00:43:53,928
The last thing I can think of is it sounds like it's in scope.

629
00:43:54,248 --> 00:44:01,708
That if I lose my entire cluster, that this can provide
me a mechanism to bring the whole thing back to life.

630
00:44:02,418 --> 00:44:02,718
Nolan Brubaker: Yep.

631
00:44:03,408 --> 00:44:03,648
Yeah.

632
00:44:03,648 --> 00:44:12,948
So the kind of the big brain vision would be to use
this and cluster API to treat your clusters as cattle.

633
00:44:13,398 --> 00:44:19,862
So if you think about this as even if let's say you
don't want to upgrade your clusters in place, right.

634
00:44:19,862 --> 00:44:28,112
Just spin up a new version of Kubernetes with cluster
API, use this to move it, kill the golden clusters.

635
00:44:28,292 --> 00:44:39,020
And maybe you use some higher level load balancer, contour, gimbal or some other thing
to move your cluster, move your traffic over once all there, then you killed the old one.

636
00:44:39,050 --> 00:44:40,490
So yeah, absolutely.

637
00:44:41,510 --> 00:44:43,070
Bret Fisher: So this is providing the.

638
00:44:43,077 --> 00:44:55,569
After a default install and assuming that you've installed Velero installed,
meaning, at, you added it to the cluster, whatever then ideally it's providing
from one cluster a command, or I guess, cause you're doing it cross cluster.

639
00:44:56,499 --> 00:45:00,729
So how does the new cluster even know about the old clusters backups?

640
00:45:00,789 --> 00:45:01,869
Is that something that's built in?

641
00:45:02,739 --> 00:45:03,849
So yeah.

642
00:45:03,879 --> 00:45:05,529
So what do you do cross cluster restores?

643
00:45:05,529 --> 00:45:06,489
Is that a feature?

644
00:45:06,489 --> 00:45:07,599
I guess I'm asking the same question.

645
00:45:08,009 --> 00:45:09,089
Nolan Brubaker: Yeah, yeah, absolutely.

646
00:45:09,151 --> 00:45:19,893
So the way that works that's actually in our documentation and that's a big, that's a big
use case is essentially the way you get that cross cluster restore is you install the hero.

647
00:45:19,983 --> 00:45:33,912
So you have to spin up the cluster, get the Velero deployment running, and
you have to make sure that Velero deployment points to the same bucket,
that same object store bucket that the previous cluster are backed up to.

648
00:45:33,912 --> 00:45:34,092
You.

649
00:45:34,687 --> 00:45:41,315
So once they're pointing to, once they're both pointing to the
same bucket then you can restore from that previous cluster.

650
00:45:41,465 --> 00:45:41,885
Okay.

651
00:45:42,605 --> 00:45:52,674
So you have cluster a right to your bucket and then you have cluster B
read from that bucket and then you can get that cross-cluster migration.

652
00:45:52,934 --> 00:45:53,324
Bret Fisher: Yeah.

653
00:45:53,594 --> 00:45:53,924
All right.

654
00:45:55,874 --> 00:45:56,384
Let's do it.

655
00:45:57,524 --> 00:45:57,824
All right.

656
00:45:57,824 --> 00:46:04,608
Well, you know, it's funny this is all like I'm realizing we basically
just created a tutorial and an intro to like backups on Kubernetes.

657
00:46:04,608 --> 00:46:05,778
Cause I think it's a big question.

658
00:46:05,778 --> 00:46:11,549
And as I kind of admitted in some of my social posts that I,
I'm part of the problem because I do training all the time.

659
00:46:11,549 --> 00:46:14,849
I'm teaching people on the internet and I don't talk about backups a lot.

660
00:46:14,849 --> 00:46:15,749
Like I just.

661
00:46:16,164 --> 00:46:23,631
It's one of those topics that people don't buy courses on people don't, they don't they
want it the backup part is something they do after they've already learned everything else.

662
00:46:23,661 --> 00:46:24,021
Right.

663
00:46:24,081 --> 00:46:32,581
And so in fact, I've now thinking about it that like the number of questions in my courses
out of 170,000 people that have asked about backup questions is really, really small.

664
00:46:32,581 --> 00:46:35,371
Like in the, like probably just a couple of handfuls of people.

665
00:46:35,761 --> 00:46:39,751
And I don't think it's because we were, we all just don't care.

666
00:46:39,800 --> 00:46:41,120
I think there's just a multiple reasons.

667
00:46:41,120 --> 00:46:46,970
Like you said, some people, like, I, my advice to most people
usually is avoid persistent data in your cluster, if at all possible.

668
00:46:47,120 --> 00:46:47,390
Right?

669
00:46:47,378 --> 00:46:48,749
If you could just use the clouds.

670
00:46:49,214 --> 00:46:58,586
Data provisioning services for RDS or whatever do that avoid, the easiest Kubernetes cluster
is the one that can go away and then you can rebuild it from manifests and it's fine.

671
00:46:58,946 --> 00:47:00,596
And you don't have to restore data.

672
00:47:00,596 --> 00:47:04,306
Like the, it would be nice if we all just never had to worry about having Velero.

673
00:47:04,586 --> 00:47:06,485
And we could just have a deployment of yeah.

674
00:47:06,545 --> 00:47:13,265
Of infrastructure as code and the cluster comes back up and connections
start happening, and then we let the cloud worry about the persistent data.

675
00:47:13,295 --> 00:47:15,455
But the reality is that everything's complex.

676
00:47:15,485 --> 00:47:17,733
We're complex in that we all have legacy apps and yeah.

677
00:47:18,273 --> 00:47:18,993
So, all right.

678
00:47:19,853 --> 00:47:20,093
Nolan Brubaker: Yeah.

679
00:47:20,103 --> 00:47:25,305
And there's a, I forgot, I think it was, I think it was Twitter.

680
00:47:25,305 --> 00:47:28,700
Somebody did a cube con I believe it was coupon Seattle.

681
00:47:28,730 --> 00:47:36,500
They talked about using Velero to do backups, and then they were playing around
Kubernetes and did an accidental command that just deleted all their clusters.

682
00:47:36,530 --> 00:47:38,630
And they were like, we didn't think we needed this.

683
00:47:39,465 --> 00:47:40,335
And then we deleted everything.

684
00:47:40,695 --> 00:47:42,868
So it's a it's an afterthought.

685
00:47:42,898 --> 00:47:52,979
And it's one that we've heard from customers is oh yeah, we, it was always a
later, later, and then they deleted stuff and now it's a, oh, we need this.

686
00:47:53,039 --> 00:47:53,909
So, yeah,

687
00:47:54,249 --> 00:47:54,589
Bret Fisher: right.

688
00:47:54,949 --> 00:47:55,039
Yeah.

689
00:47:55,039 --> 00:47:55,579
We thought we did.

690
00:47:55,579 --> 00:48:01,669
We thought we didn't have persistent data, but it turns out that we
actually changed things in Kubernetes and we needed that persistent.

691
00:48:01,789 --> 00:48:02,029
Yeah.

692
00:48:02,131 --> 00:48:02,341
Yeah.

693
00:48:02,821 --> 00:48:03,031
Yeah.

694
00:48:03,089 --> 00:48:15,089
It's a valid reason for moving everything to as much infrastructure as code and
get ops and remove the command line from anyone's remove the API connections
from anyone's local machine and only allow the automation bots to do that.

695
00:48:15,089 --> 00:48:16,106
And that, it's a hard thing.

696
00:48:16,106 --> 00:48:19,522
And most of the teams I work with don't ever get that to that level.

697
00:48:19,589 --> 00:48:23,638
Just because there's a lot of things that have to go into that, that sort of bites you in the rear.

698
00:48:23,638 --> 00:48:29,758
If you don't really have a strong, automated pipeline, but
that's for another podcast, we have a couple more questions.

699
00:48:29,804 --> 00:48:30,370
One is.

700
00:48:30,394 --> 00:48:33,784
Do you recommend just doing manual restore testing regularly then?

701
00:48:33,844 --> 00:48:36,124
I guess since there is no automated?

702
00:48:37,554 --> 00:48:38,034
Nolan Brubaker: Yeah.

703
00:48:38,034 --> 00:48:47,150
At this point I would recommend that I would strongly recommend working
to automate that and that's something we're working on internally.

704
00:48:47,168 --> 00:48:48,378
It's something we want to get to.

705
00:48:48,738 --> 00:48:53,714
We also have community meetings on Tuesdays at noon Eastern.

706
00:48:53,715 --> 00:48:58,789
If folks want to discuss like approaches to that and that's absolutely a valid topic.

707
00:48:58,778 --> 00:49:04,958
If folks want to discuss how they might approach that and
share that information, absolutely valid a valid thing.

708
00:49:04,940 --> 00:49:14,863
If users want to share what they've tried it's, that would be great to hear One, one
thing I will say as a developer of Lero, a lot of my clusters don't live very long.

709
00:49:15,283 --> 00:49:15,463
Right.

710
00:49:15,513 --> 00:49:22,956
So that, that would be great to get some insight from folks who have
clusters that live a lot longer than me or a lot longer than mine.

711
00:49:23,005 --> 00:49:27,133
So I would definitely welcome feedback and user experience there.

712
00:49:27,126 --> 00:49:37,243
And so, yeah, we've got discussions on slack channel, so I would recommend testing your,
your restores, whether they're manual or automated as much as you can at this point.

713
00:49:37,226 --> 00:49:42,310
Just because the backups are good, but if you try to restore and they don't work.

714
00:49:42,925 --> 00:49:45,600
Then it's just as bad as not having backups.

715
00:49:45,960 --> 00:49:46,170
Yeah.

716
00:49:46,250 --> 00:49:46,730
Bret Fisher: There's stuff.

717
00:49:46,730 --> 00:49:47,060
That's.

718
00:49:47,122 --> 00:49:49,432
To me, most of the headaches have nothing to do with the backup tool.

719
00:49:49,582 --> 00:49:53,948
Like the re the restore of a backup may be fantastic,
but the application didn't write to disc properly.

720
00:49:53,948 --> 00:49:57,963
So I don't actually have valid data backups or my connections come in to different end points.

721
00:49:57,963 --> 00:50:01,113
And those end points were lost and restored, and I didn't update those things.

722
00:50:01,113 --> 00:50:03,276
So there's so many things there that nothing.

723
00:50:03,286 --> 00:50:08,226
I used to have an old boss that would basically
walking on a Friday where there wasn't a lot going on.

724
00:50:08,266 --> 00:50:09,936
He would say, okay, everybody in the conference room.

725
00:50:09,936 --> 00:50:13,446
And we all kind of knew what that meant, because he was going to say, today is Dr.

726
00:50:13,446 --> 00:50:13,746
Day.

727
00:50:14,076 --> 00:50:18,126
Imagine the data centers gone, how do we start restoring the data center?

728
00:50:18,186 --> 00:50:25,667
And it would make the, it would take, we would have the DBA team that we'd have everybody
essentially all hands on deck saying, okay, let's let's go through this exercise.

729
00:50:25,667 --> 00:50:26,897
And it always.

730
00:50:27,822 --> 00:50:28,872
It was basically a shit show.

731
00:50:28,992 --> 00:50:33,026
It was always you know, scrambling to try to figure out all the things and all the teams.

732
00:50:33,026 --> 00:50:35,846
And we realized no matter how much documentation we had, there was always a gap.

733
00:50:36,146 --> 00:50:38,786
There was always some exception because since the last time we did it.

734
00:50:40,366 --> 00:50:40,576
Nolan Brubaker: Yeah.

735
00:50:40,576 --> 00:50:40,786
Yeah.

736
00:50:41,336 --> 00:50:42,916
It's like planned chaos monkey.

737
00:50:43,276 --> 00:50:43,576
Bret Fisher: Yup.

738
00:50:44,416 --> 00:50:47,566
With a bunch of humans, a manual, a chaos monkey.

739
00:50:47,566 --> 00:50:47,626
Yeah.

740
00:50:47,656 --> 00:50:48,993
Of manual activities.

741
00:50:48,993 --> 00:50:49,233
Yeah.

742
00:50:49,228 --> 00:50:52,198
I CA I don't think there's such a thing as too much backup testing.

743
00:50:52,210 --> 00:50:54,150
Especially if you're someone who's responsible for backups.

744
00:50:54,160 --> 00:51:05,860
Cause I think a lot of organizations just, uh, assume that the backup
person or the team responsible for that might be the same as the storage
team that they're somehow like magically able to test all these apps.

745
00:51:05,890 --> 00:51:09,520
And they're usually not like they're not the developers, they're not the operators.

746
00:51:09,520 --> 00:51:14,740
So they don't necessarily have the capability to even
know if the apps were going to work if they restore it.

747
00:51:14,710 --> 00:51:15,820
Yeah, that's a hard thing.

748
00:51:15,880 --> 00:51:21,425
And I sympathize for those people because they're usually the most relied
on in the situation, but they're usually the ones that have the least.

749
00:51:21,419 --> 00:51:26,699
Amount information about how the apps are supposed to work and all the other
things outside of that, like networking that usually needs to be involved as well.

750
00:51:26,736 --> 00:51:26,856
Yeah.

751
00:51:27,506 --> 00:51:28,286
Or even your cloud

752
00:51:28,286 --> 00:51:28,646
Nolan Brubaker: vendor.

753
00:51:28,646 --> 00:51:39,080
Like if you think about it if Amazon or Google cloud or Azure go down, like
their concern isn't necessarily your app it's getting their infrastructure back.

754
00:51:39,080 --> 00:51:39,470
Now.

755
00:51:40,070 --> 00:51:42,650
They have a huge incentive to get back online.

756
00:51:42,650 --> 00:51:43,430
That's for sure.

757
00:51:43,790 --> 00:51:46,409
But like their incentive is not your application.

758
00:51:46,409 --> 00:51:50,691
So a lot of that does fall on your organization's shoulders.

759
00:51:50,731 --> 00:51:53,265
So it owning your uptime.

760
00:51:54,645 --> 00:51:57,174
I think a lot of people say that, but it's hard to do.

761
00:51:57,174 --> 00:51:58,554
It's expensive and it's hard.

762
00:51:59,574 --> 00:51:59,784
Bret Fisher: Yeah.

763
00:51:59,784 --> 00:52:01,104
And it's not immediately effective.

764
00:52:01,104 --> 00:52:05,037
It's if the deal, if the failure never happens, then no one ever got to see all your work.

765
00:52:05,017 --> 00:52:07,310
Nolan Brubaker: Yeah, that's a lot of, that's a lot of.

766
00:52:08,400 --> 00:52:12,180
Time and effort spent for something that hopefully never happens.

767
00:52:12,320 --> 00:52:12,590
Bret Fisher: Right.

768
00:52:12,890 --> 00:52:14,810
Which is why it gets pushed to the back of the project.

769
00:52:14,810 --> 00:52:14,960
Right.

770
00:52:14,960 --> 00:52:19,280
Because all the project delays are happening and everyone's
like, well, we'll just do the DRA testing later then.

771
00:52:20,000 --> 00:52:21,320
So yeah.

772
00:52:21,470 --> 00:52:22,970
Well, this has been a great discussion.

773
00:52:23,330 --> 00:52:24,800
Thank you all for the questions.

774
00:52:24,915 --> 00:52:28,065
Hopefully we'll have more to talk about in the future about backups and Dr.

775
00:52:28,065 --> 00:52:33,555
And when you guys get some major new features, it'd be great
to have you back on the show to, to get a catch up to this.

776
00:52:33,555 --> 00:52:39,045
But for those of you out there, the message is try this stuff, do the right thing.

777
00:52:39,165 --> 00:52:45,523
Like don't let it be that one day that you suddenly, cause
some, sometimes jobs, jobs depend on this and it's important.

778
00:52:45,943 --> 00:52:48,286
So we'll do better on our end talking about it.

779
00:52:48,286 --> 00:52:50,506
You do your bed better on your end of actually using it.

780
00:52:50,546 --> 00:52:52,106
Thanks a lot Nolan for being on the show.

781
00:52:52,506 --> 00:52:52,746
Yeah.

782
00:52:52,776 --> 00:52:59,422
And people can find you at the little Twitter handles on our little page
there , you can get Velero, velero.io, and there's also a Twitter handle.

783
00:52:59,422 --> 00:52:59,562
Right.

784
00:52:59,612 --> 00:53:00,832
I think it's Velero project

785
00:53:00,818 --> 00:53:02,013
projectvelero or projectvelero.

786
00:53:02,013 --> 00:53:03,207
So you can

787
00:53:03,207 --> 00:53:05,037
Bret Fisher: follow them on Twitter and see their releases.

788
00:53:05,457 --> 00:53:05,737
Yeah.

789
00:53:05,847 --> 00:53:05,997
Nolan Brubaker: Awesome.

790
00:53:06,017 --> 00:53:07,577
So on the Kubernetes slack

791
00:53:07,615 --> 00:53:08,843
in the Velero channel.

792
00:53:08,893 --> 00:53:09,013
Nolan Brubaker: Oh,

793
00:53:09,013 --> 00:53:09,493
Bret Fisher: nice.

794
00:53:09,853 --> 00:53:10,153
Yeah.

795
00:53:11,307 --> 00:53:11,697
Bret: All right.

796
00:53:11,697 --> 00:53:13,287
I hope you enjoyed that.

797
00:53:13,612 --> 00:53:16,582
Demo and conversation with Nolan from VMware.

798
00:53:17,212 --> 00:53:20,602
And of course you can get all the stuff in the show notes, all the links and info.

799
00:53:21,112 --> 00:53:24,872
And i will see you in the next episode