1
00:00:00,254 --> 00:00:04,304
There's another kind of interesting
decision here on Dropbox by

2
00:00:04,304 --> 00:00:06,824
design was always like a sidecar.

3
00:00:06,824 --> 00:00:09,974
It's always something that just
sits and it looks at your files.

4
00:00:09,974 --> 00:00:12,434
Your files are just regular
files on the file system.

5
00:00:12,794 --> 00:00:17,408
And if Dropbox, the app isn't running,
your files are there and they're safe,

6
00:00:17,408 --> 00:00:21,638
and it's something that  you know,
regular apps  can just read and write

7
00:00:21,638 --> 00:00:26,988
to, and in some sense  like Dropbox
was unintentionally local-first

8
00:00:27,008 --> 00:00:28,508
from that perspective, right?

9
00:00:28,538 --> 00:00:31,658
Because it's saying that no
matter what happens, your data

10
00:00:31,658 --> 00:00:32,918
is just there and you own it.

11
00:00:33,984 --> 00:00:36,084
Welcome to the localfirst.fm podcast.

12
00:00:36,444 --> 00:00:39,174
I'm your host, Johannes Schickling,
and I'm a web developer, a

13
00:00:39,174 --> 00:00:42,234
startup founder, and love the
craft of software engineering.

14
00:00:42,654 --> 00:00:46,194
For the past few years, I've been on a
journey to build a modern high quality

15
00:00:46,194 --> 00:00:50,034
music app using web technologies, and
in doing so, I've been following down

16
00:00:50,034 --> 00:00:51,984
the rabbit hole of local-first software.

17
00:00:52,494 --> 00:00:55,374
This podcast is your invitation
to join me on that journey.

18
00:00:56,154 --> 00:00:58,924
In this episode, I'm
speaking to Sujay Jayakar.

19
00:00:59,439 --> 00:01:02,319
Co-founder of Convex and
Early Engineer  at Dropbox.

20
00:01:02,739 --> 00:01:06,609
In this conversation, Sujay shares
the story on how the Sync Engine

21
00:01:06,669 --> 00:01:11,169
powering Dropbox was built initially
and later redesigned to address all

22
00:01:11,169 --> 00:01:13,209
sorts of distributed systems problems.

23
00:01:13,689 --> 00:01:18,999
Before getting started, also a big thank
you to Jazz for supporting this podcast.

24
00:01:19,299 --> 00:01:21,099
And now my interview with Sujay.

25
00:01:22,211 --> 00:01:23,051
Hey, Sujay.

26
00:01:23,081 --> 00:01:24,701
So nice to have you on the show.

27
00:01:24,701 --> 00:01:25,421
How are you doing?

28
00:01:25,901 --> 00:01:26,501
Doing great.

29
00:01:26,506 --> 00:01:26,736
Great.

30
00:01:26,981 --> 00:01:27,911
Really happy to be here.

31
00:01:28,361 --> 00:01:30,461
I'm super excited to have you on the show.

32
00:01:30,491 --> 00:01:35,244
I've been using your work really
since over a decade at this point

33
00:01:35,244 --> 00:01:39,351
when I was really getting into
using computers productively.

34
00:01:39,681 --> 00:01:45,124
And we just the other time had another
really interesting guest, Seph Gentle on

35
00:01:45,124 --> 00:01:50,534
the podcast, who has worked on a really
fascinating tool, called Google Wave

36
00:01:50,534 --> 00:01:52,904
back then that had a big impact on me.

37
00:01:53,204 --> 00:01:56,264
And you've been working on another
technology that had a big impact

38
00:01:56,264 --> 00:02:01,351
on me, which is Dropbox and still
has a very positive impact on me.

39
00:02:01,531 --> 00:02:05,431
That was all the way back then
over 10 years ago in 2014.

40
00:02:05,731 --> 00:02:11,181
I don't think I need to explain to
the audience what Dropbox is, but, I

41
00:02:11,181 --> 00:02:15,441
want to hear it from you, like what
led you to join Dropbox, I think very

42
00:02:15,441 --> 00:02:19,851
early on and just hearing a little
bit just embedded in your personal

43
00:02:20,001 --> 00:02:24,231
context when you joined it, and then
we're gonna go dive really deep into

44
00:02:24,231 --> 00:02:27,201
all things syncing related, et cetera.

45
00:02:27,201 --> 00:02:27,951
How does that sound?

46
00:02:28,731 --> 00:02:29,721
Yeah, that sounds great.

47
00:02:30,021 --> 00:02:31,761
It's actually a really funny story.

48
00:02:31,803 --> 00:02:34,533
my career here in
technology started in 2012.

49
00:02:34,853 --> 00:02:37,883
I was actually studying mathematics.

50
00:02:37,883 --> 00:02:44,243
I was going to go work at the NSA doing
cryptography, and I was born in India.

51
00:02:44,408 --> 00:02:48,273
but I'm a naturalized citizen for
the United States, and, you have to

52
00:02:48,273 --> 00:02:52,343
be, have security clearance to go do
these types of cryptography things.

53
00:02:52,993 --> 00:02:58,063
And you know, my clearance kept
on dragging on and on and on and

54
00:02:58,223 --> 00:03:01,873
they like interviewed my roommates
and apparently just a very sketchy

55
00:03:01,873 --> 00:03:06,113
guy  so I had an offer to go work
there, but it kept on dragging on.

56
00:03:06,323 --> 00:03:10,543
And then my roommate, at the time was a
computer science major who wanted some

57
00:03:10,833 --> 00:03:15,459
like someone to go with him to the career
fair and, just started chatting with the

58
00:03:15,459 --> 00:03:18,909
Dropbox people and you know, it's about
like a hundred people around that time.

59
00:03:19,329 --> 00:03:23,786
And, chatting turned into hanging out
at dinner, turned into interviewing

60
00:03:23,786 --> 00:03:26,906
and being a math person, I did
my interviews all in Haskell and

61
00:03:26,906 --> 00:03:28,466
didn't know any real programming.

62
00:03:28,909 --> 00:03:33,619
and then yeah, that turned into doing an
internship, dropping out of undergrad and.

63
00:03:34,090 --> 00:03:35,391
just  following the dream.

64
00:03:35,391 --> 00:03:38,841
And so I worked on, at Dropbox,
I worked on a bunch of things.

65
00:03:38,841 --> 00:03:40,971
I started off working on
our, like, growth team.

66
00:03:40,971 --> 00:03:43,761
So I did a lot of like email system.

67
00:03:43,761 --> 00:03:47,051
Like I did this, I worked on this thing
called the space raise, like a promotion.

68
00:03:47,361 --> 00:03:48,771
Oh, I remember that.

69
00:03:48,771 --> 00:03:49,161
Yes.

70
00:03:49,161 --> 00:03:53,631
I think I've, I've earned quite a lot
of like free storage, which I think

71
00:03:53,631 --> 00:03:55,701
over the time has like gone down.

72
00:03:56,001 --> 00:03:58,641
But that was a very smart
and effective mechanism.

73
00:03:58,641 --> 00:04:01,371
I surely invited all my friends back then.

74
00:04:01,371 --> 00:04:05,901
I couldn't afford a premium plan
being a broke student that worked.

75
00:04:07,381 --> 00:04:11,381
And then from there worked on
the sync engine for some time.

76
00:04:11,411 --> 00:04:15,961
And then right now I'm the co-founder and
chief scientist of a startup called Convex

77
00:04:16,141 --> 00:04:20,251
and my three co-founders and I met working
on this project called Magic Pocket,

78
00:04:20,251 --> 00:04:25,474
where Dropbox stores hundreds of petabytes
now exabytes of files, for users.

79
00:04:25,474 --> 00:04:26,854
And we used to do that in S3.

80
00:04:27,064 --> 00:04:32,284
And so the three of us worked together on
a team to build Amazon S3, but in-house

81
00:04:32,284 --> 00:04:33,994
and migrate all of the data over.

82
00:04:34,364 --> 00:04:39,116
so we did that for a few years and then,
Worked on rewriting the entirety of

83
00:04:39,116 --> 00:04:42,866
Dropbox, the sync engine, the thing that
runs on all of our desktop computers.

84
00:04:43,139 --> 00:04:46,919
we rewrote it to be really correct
and scalable and very flexible.

85
00:04:47,279 --> 00:04:48,369
and shipped that.

86
00:04:48,650 --> 00:04:52,673
after that left Dropbox in 2020  I
was trying to decide if I wanted

87
00:04:52,673 --> 00:04:54,353
to get back to academics or not.

88
00:04:54,353 --> 00:04:59,513
So I did some research  in networking
and then decided to start Convex in 2021.

89
00:04:59,843 --> 00:05:03,531
Certainly curious, which sort of
research has had your interest the

90
00:05:03,531 --> 00:05:07,611
most in this sort of transitionary
per period, but maybe we stash that

91
00:05:07,672 --> 00:05:11,494
for a moment and go back to the
beginning when you joined Dropbox.

92
00:05:11,805 --> 00:05:15,451
you mentioned there were around a
hundred people working there currently.

93
00:05:15,728 --> 00:05:20,078
how do I need to imagine the technology
behind Dropbox at this point?

94
00:05:20,378 --> 00:05:26,714
it clearly started all out with like,
desktop focused  like daemon project,

95
00:05:27,054 --> 00:05:33,204
like daemon process that's running on your
machine somehow keeps track of the files

96
00:05:33,294 --> 00:05:36,774
on your system and then applies the magic.

97
00:05:37,044 --> 00:05:43,138
So explain to me how things worked
back then and what was it like to

98
00:05:43,138 --> 00:05:46,258
work at Dropbox when there were
around about a hundred people.

99
00:05:46,888 --> 00:05:49,678
Yeah, I mean, it was
pretty magical, right?

100
00:05:49,678 --> 00:05:54,058
Because the company had, I think gotten
so many things right on the product side

101
00:05:54,058 --> 00:05:55,968
and then those showed up in technology.

102
00:05:55,968 --> 00:06:00,238
But just  this feeling of like Dropbox
being this product that just worked right?

103
00:06:00,238 --> 00:06:01,468
It was for everyone.

104
00:06:01,738 --> 00:06:05,758
It was not just for technologists, but
anyone should be able, anyone who's

105
00:06:05,758 --> 00:06:09,478
comfortable using a computer should
be able to install Dropbox and have

106
00:06:09,538 --> 00:06:12,118
a folder of theirs become magical.

107
00:06:12,508 --> 00:06:15,748
And without understanding anything
about how it works, they should

108
00:06:15,748 --> 00:06:19,198
just think of it as like an
extension of what they know already.

109
00:06:19,468 --> 00:06:19,708
yeah.

110
00:06:19,708 --> 00:06:22,948
And so like the ways that that showed
up I think were really interesting.

111
00:06:22,948 --> 00:06:26,048
At the time there was a very strong
culture of like reverse engineering.

112
00:06:26,678 --> 00:06:29,998
So  to have this daemon that runs locally.

113
00:06:30,238 --> 00:06:34,348
You know, there was  one of the amazing
early moments in Dropbox was that,

114
00:06:34,564 --> 00:06:38,614
if like you open up finder or explore
and you have the overlays on it.

115
00:06:39,094 --> 00:06:43,294
Like that used to be done by
like attaching to the finder

116
00:06:43,294 --> 00:06:45,124
process and injecting code into it

117
00:06:47,854 --> 00:06:51,994
and to the point where, uh, when some
folks had gone to talk to Apple at the

118
00:06:51,994 --> 00:06:57,364
time and about like  working with the
file system and everything like the,

119
00:06:57,874 --> 00:07:02,584
there were teams at Apple that asked
Dropbox, how did you do that in Finder?

120
00:07:05,374 --> 00:07:08,674
So you wanted to offer the
most native experience.

121
00:07:08,674 --> 00:07:11,044
There weren't the necessary APIs for that.

122
00:07:11,314 --> 00:07:12,724
And so you just made it happen.

123
00:07:12,754 --> 00:07:13,444
That's amazing.

124
00:07:13,449 --> 00:07:13,459
Yeah.

125
00:07:14,164 --> 00:07:14,464
Yeah.

126
00:07:14,804 --> 00:07:19,744
And so that  that idea of like, how do
you create the best user experience,

127
00:07:19,744 --> 00:07:26,464
something that  you know, for the purpose
of making non-technical users feel very

128
00:07:26,464 --> 00:07:28,654
confident and feel very safe using it.

129
00:07:28,834 --> 00:07:32,584
That was another, I think, really
deep  like company value of like

130
00:07:32,584 --> 00:07:36,544
being worthy of trust and taking
people's files very seriously.

131
00:07:36,604 --> 00:07:39,544
You know, I like remember having a
friend who was in residency at the

132
00:07:39,544 --> 00:07:44,314
time and he was telling me that  he
keeps all of his, like some of his non

133
00:07:44,314 --> 00:07:50,014
HIPAA stuff, but like  his things that
he looks at  on Dropbox and you know,

134
00:07:50,014 --> 00:07:51,754
pulls him up and he's consulting 'em.

135
00:07:51,754 --> 00:07:54,374
And there's a part of me which
is terrified by that, right?

136
00:07:54,374 --> 00:07:57,934
Like  we think of software as
something where like throwing a 500

137
00:07:57,934 --> 00:07:59,554
error is fine every once in a while.

138
00:08:00,004 --> 00:08:03,054
And a Dropbox that was there was
a culture of making users feel

139
00:08:03,054 --> 00:08:04,284
like they could really trust us.

140
00:08:04,314 --> 00:08:08,274
And then that showed up for things
like making sure that, like when

141
00:08:08,274 --> 00:08:11,604
we give feedback to users, if we
put that green overlay in finder.

142
00:08:12,189 --> 00:08:16,689
They know that no matter what happens,
they could throw their laptop in a pool.

143
00:08:16,689 --> 00:08:20,739
They could  like they, anything could
happen and their files are safe.

144
00:08:20,869 --> 00:08:24,609
Like  if their house burns down, they
don't have to worry about that thing.

145
00:08:24,879 --> 00:08:29,469
And that's like all of that reverse
engineering and all of the emphasis

146
00:08:29,469 --> 00:08:31,269
on correctness and durability.

147
00:08:31,479 --> 00:08:34,209
It was all in service of that feeling,
which I think was really cool.

148
00:08:34,763 --> 00:08:38,273
so on the engineering side, at the
time  it was like in hyper growth mode.

149
00:08:38,273 --> 00:08:40,793
So they had a Python desktop client.

150
00:08:40,973 --> 00:08:43,433
Almost all of Dropbox was
in Python at the time.

151
00:08:43,793 --> 00:08:49,943
And so there's a  pre my py, like  big
rapidly changing desktop client  that

152
00:08:50,283 --> 00:08:53,693
needed to support Mac, windows and Linux
and all these different file systems.

153
00:08:53,963 --> 00:08:58,313
and then on the server, it was like we
had one big server called Meta Server,

154
00:08:58,646 --> 00:08:59,936
meta, I think was from metadata.

155
00:09:00,243 --> 00:09:03,513
and that like ran almost all of Dropbox.

156
00:09:03,573 --> 00:09:06,223
We stored the metadata in MySQL.

157
00:09:06,673 --> 00:09:11,463
The  files were stored in S3, and then
we had a separate notification server

158
00:09:11,463 --> 00:09:13,713
for managing pushes and things like that.

159
00:09:13,923 --> 00:09:17,433
And so it was like  kind of classic
architecture and like  reach was

160
00:09:17,463 --> 00:09:20,403
starting to reach the limits of
its scaling  even at that time.

161
00:09:20,913 --> 00:09:24,106
And, those were a lot of the things
we worked on over the next 10 years.

162
00:09:24,616 --> 00:09:25,126
Wow.

163
00:09:25,366 --> 00:09:27,766
So was the server also written in Python?

164
00:09:27,766 --> 00:09:29,566
So it was all one big python shop.

165
00:09:30,076 --> 00:09:30,436
Yeah.

166
00:09:30,886 --> 00:09:32,836
And the server was all written in Python.

167
00:09:33,373 --> 00:09:39,043
we, had some pretty funny bugs
that were due to  it's kind of

168
00:09:39,043 --> 00:09:40,213
crazy to think about it now.

169
00:09:40,213 --> 00:09:44,743
You know, we, you working in TypeScript
and full time and to think of, like

170
00:09:44,743 --> 00:09:48,433
back in the day we just had these like
hundreds of thousands, millions of lines

171
00:09:48,433 --> 00:09:54,383
of code with no type safety and with
all types of crazy meta programming and

172
00:09:55,053 --> 00:09:57,553
decorators and meta classes and stuff.

173
00:09:57,553 --> 00:10:00,193
And yeah, so there was a, it was
all in Python when I showed up.

174
00:10:00,235 --> 00:10:04,453
it was not all in Python and not all in
one big monolithic service when I left.

175
00:10:04,814 --> 00:10:09,644
So you mentioning joining when there
were around a hundred people and you

176
00:10:09,644 --> 00:10:14,894
probably already at this point had
like multitudes more in terms of users.

177
00:10:15,274 --> 00:10:21,004
Being in hypergrowth, it is sort of this
race against time where you only have

178
00:10:21,004 --> 00:10:26,344
so much time to work on something, but
growth may be outrun you already and

179
00:10:26,344 --> 00:10:28,624
things are already starting to break.

180
00:10:28,624 --> 00:10:33,514
Or You know like, okay, if things
gonna grow like this, this system will

181
00:10:33,514 --> 00:10:36,508
break and it's gonna be pretty bad.

182
00:10:36,808 --> 00:10:42,171
So tell me more about how you were
dealing with like the constant r

183
00:10:42,321 --> 00:10:48,778
race against time to rebuild systems,
redesign systems, putting out fires.

184
00:10:49,018 --> 00:10:49,948
What was that like?

185
00:10:50,224 --> 00:10:53,374
Yeah, and I think there's like kind of
an interesting place to take this on.

186
00:10:53,374 --> 00:10:56,584
I think like  the normal things
were on scale right there.

187
00:10:56,584 --> 00:10:57,274
Those were like.

188
00:10:57,619 --> 00:11:00,416
One, kinda class of problems of
being able to handle the load.

189
00:11:00,626 --> 00:11:04,849
But I think one kind of really
interesting, dimension of this that led

190
00:11:04,849 --> 00:11:09,829
to our decision to start rewriting all
of the sync engine in 2016 was actually

191
00:11:09,829 --> 00:11:11,749
just like customer debugging load.

192
00:11:12,619 --> 00:11:17,449
You know, we have  we had hundreds of
millions of active users and they were

193
00:11:17,539 --> 00:11:20,149
using Dropbox in all types of crazy ways.

194
00:11:20,389 --> 00:11:24,019
Like one of the stories is  someone
was using Dropbox with like, I think

195
00:11:24,019 --> 00:11:27,559
it was running on some, I don't know
if it was like a raspberry pie or

196
00:11:27,559 --> 00:11:28,849
something, something on his tractor.

197
00:11:28,879 --> 00:11:32,749
Like the guy ran a farm and he
was using Dropbox to sink like

198
00:11:32,749 --> 00:11:35,089
pads in text files to his tractor.

199
00:11:35,533 --> 00:11:37,633
And  I might be getting some
of the details wrong, but

200
00:11:37,633 --> 00:11:38,353
it's something like that.

201
00:11:38,353 --> 00:11:43,243
And so people would just use Dropbox
in all types of crazy ways on crazy

202
00:11:43,243 --> 00:11:47,913
file systems with  kernel modules
running that are messing things around

203
00:11:47,913 --> 00:11:52,251
or  so I think, You know, in terms of
getting ahead of scale, I think we found

204
00:11:52,251 --> 00:11:58,644
ourselves around 2015, 2016, in the
place where for the syn engine on the

205
00:11:58,644 --> 00:12:03,864
desktop client, the entire team just
spent all of its time debugging issues.

206
00:12:04,644 --> 00:12:08,934
We had this principle  of like anything
that's possible, anything that a

207
00:12:08,934 --> 00:12:13,254
protocol allows, anything that  some
threading race condition that's

208
00:12:13,254 --> 00:12:15,864
theoretically possible will be possible.

209
00:12:16,404 --> 00:12:17,934
And then we would see it, right?

210
00:12:17,934 --> 00:12:20,514
Like users would write in
saying, my files aren't sinking.

211
00:12:20,814 --> 00:12:24,414
And then we would look at it and we would
spend months debugging each one of these

212
00:12:24,414 --> 00:12:30,218
issues and trying to read the tea leaves
from traces and reports and reproductions.

213
00:12:30,218 --> 00:12:33,878
And it'll be like, oh  they
mounted this file system over here

214
00:12:33,878 --> 00:12:36,278
and then this one and this one
are in a different file system.

215
00:12:36,278 --> 00:12:40,188
So  moving the file actually did
a copy, but then the X adders

216
00:12:40,188 --> 00:12:42,138
were in, preserved this and that.

217
00:12:42,478 --> 00:12:46,168
You know, in terms of that theme of like
getting ahead of scale, like I think there

218
00:12:46,168 --> 00:12:51,238
was first this realization that like the
set of possible things that can happen in

219
00:12:51,238 --> 00:12:54,508
the system is just astronomically large.

220
00:12:54,598 --> 00:12:57,298
And all of them will happen
if they're allowed to.

221
00:12:57,718 --> 00:13:01,498
And we do not have, no matter
how much like incremental time

222
00:13:01,498 --> 00:13:04,798
we put into debugging things, we
will never be able to keep up.

223
00:13:05,128 --> 00:13:08,188
And the cost of doing that is
that the entire team is working

224
00:13:08,188 --> 00:13:09,628
on maintenance like this.

225
00:13:09,628 --> 00:13:11,098
We couldn't build any new features.

226
00:13:11,578 --> 00:13:15,958
So I think that was a motivation then for
the rewrite to is can we find like points

227
00:13:15,958 --> 00:13:20,758
of leverage where if we just invest a
little bit in technology upfront, like by

228
00:13:20,758 --> 00:13:25,768
architecting things a particular way, can
we just eliminate a much bigger set of

229
00:13:25,768 --> 00:13:29,638
potential work from debugging and working
with customers and stuff like that.

230
00:13:29,974 --> 00:13:33,974
So maybe this is a good time
to take a step back and try to

231
00:13:33,974 --> 00:13:38,354
better understand what was Dropbox
sync Engine actually back then?

232
00:13:38,654 --> 00:13:45,041
So from just thinking about it through
like a user's perspective, I have maybe

233
00:13:45,041 --> 00:13:48,094
two computers, and I have files over here.

234
00:13:48,094 --> 00:13:53,179
I. I want to make sure that I have the
files synced over from here to here.

235
00:13:53,569 --> 00:13:59,299
So I could now think about this as
sort of like a Git style, approach.

236
00:13:59,539 --> 00:14:01,219
Maybe there's other ways as well.

237
00:14:01,489 --> 00:14:05,329
walk me through sort of like through the
solution space, how this could have been

238
00:14:05,389 --> 00:14:07,459
approached and how was it approached?

239
00:14:07,592 --> 00:14:12,032
is there some sort of like diffing
involved between different file states

240
00:14:12,032 --> 00:14:14,222
over time, those are being synced around.

241
00:14:14,462 --> 00:14:17,792
Do you sync around the
actual file content itself?

242
00:14:18,062 --> 00:14:19,142
Help me to understand.

243
00:14:19,142 --> 00:14:24,752
Building a mental model, what does it mean
back then for the sync engine to work?

244
00:14:25,187 --> 00:14:25,697
Yeah.

245
00:14:25,757 --> 00:14:26,087
Yeah.

246
00:14:26,207 --> 00:14:28,427
It's a super interesting question, right?

247
00:14:28,427 --> 00:14:31,517
Because I think like you're saying,
there's so many different paths one

248
00:14:31,517 --> 00:14:34,667
can take and it's, I think one of
those things where like if someone

249
00:14:34,667 --> 00:14:37,307
asks, like design Dropbox in an
interview question, there's like

250
00:14:37,517 --> 00:14:39,767
definitely not one right answer, right?

251
00:14:39,797 --> 00:14:44,417
It's like there are so many trade-offs and
like different forks in the decision tree.

252
00:14:44,777 --> 00:14:48,767
I think one of the first things is
that, so you have your desktop A and you

253
00:14:48,767 --> 00:14:52,547
have your, maybe you have your desktop
and your laptop, and one of the first

254
00:14:52,547 --> 00:14:55,877
decisions for Dropbox is that we would
have a central server in the middle,

255
00:14:56,417 --> 00:15:01,097
that there would be a Dropbox file system
in the middle that Dropbox, the company

256
00:15:01,097 --> 00:15:05,897
ran, and we did that from this trust
perspective, we wanted to say that we

257
00:15:05,897 --> 00:15:10,217
will run this infallibly when you get
that green check mark when it's there.

258
00:15:11,177 --> 00:15:15,077
You know, even if an asteroid destroys
the eastern side of the United

259
00:15:15,077 --> 00:15:17,737
States, like we will have things
replicated in multiple data centers.

260
00:15:18,267 --> 00:15:22,367
And that  you know, and then
also it's accessible anywhere

261
00:15:22,367 --> 00:15:23,027
on the internet, right?

262
00:15:23,027 --> 00:15:24,287
You can go to the library.

263
00:15:24,347 --> 00:15:26,897
This is not so common these days  but
I remember when I was a student, like,

264
00:15:26,897 --> 00:15:29,747
go to the library, log into Dropbox
and read all your things right?

265
00:15:30,030 --> 00:15:31,800
rather than having to
bring a USB stick around.

266
00:15:32,180 --> 00:15:36,570
And  so I think that is the first
decision, but it's not necessary, right?

267
00:15:36,570 --> 00:15:39,450
Like there were plenty of
distributed, entirely peer to

268
00:15:39,450 --> 00:15:42,090
peer file syncing, designs, right?

269
00:15:42,420 --> 00:15:44,700
And so that was the first decision.

270
00:15:44,970 --> 00:15:48,680
And I think the kind of second decision
was that if we imagine our desktop and

271
00:15:48,680 --> 00:15:52,760
our laptop and you have the server in
the middle, the desktop might be on

272
00:15:52,760 --> 00:15:55,760
Windows, the laptop might be on Mac OS.

273
00:15:56,030 --> 00:15:59,360
So I think that decision to
support multiple platforms.

274
00:15:59,705 --> 00:16:01,745
Is like another really interesting one.

275
00:16:02,105 --> 00:16:05,855
This is like where I think Git and
Dropbox can be a little bit different.

276
00:16:06,065 --> 00:16:09,395
And that Git is at the end of
the day quite Linux centric.

277
00:16:09,605 --> 00:16:11,965
It's case sensitive for its file system.

278
00:16:12,275 --> 00:16:15,875
It  deals with directories and it
makes particular assumptions about

279
00:16:15,875 --> 00:16:17,195
how directories should behave.

280
00:16:17,555 --> 00:16:19,335
And that was something with Dropbox.

281
00:16:19,335 --> 00:16:22,575
We wanted to be consumer, we wanted
to support everything and we wanted

282
00:16:22,575 --> 00:16:24,225
it to feel very automatic, right?

283
00:16:24,225 --> 00:16:28,095
That like, someone shouldn't have to
understand like what a, like unicode,

284
00:16:28,095 --> 00:16:29,895
normalization disagreement means.

285
00:16:29,895 --> 00:16:30,285
Right?

286
00:16:30,495 --> 00:16:34,275
Where in Git like in really bad settings,
like you might have to understand

287
00:16:34,275 --> 00:16:38,205
that, that  you're right, you with an
accent differently on Mac and Windows.

288
00:16:38,732 --> 00:16:40,555
so I think that's the
kind of like next, side.

289
00:16:40,555 --> 00:16:43,645
So then Dropbox has its design
for a file system and it's a

290
00:16:43,645 --> 00:16:47,925
central, it's like the hub and all
those folks are your phone, your.

291
00:16:48,258 --> 00:16:49,925
desktop, your laptop and whatnot.

292
00:16:50,385 --> 00:16:53,288
and then so to kind of get
down to the details a bit more.

293
00:16:53,468 --> 00:16:56,618
So then, yeah, we have a process that
runs on your computer, that's the

294
00:16:56,618 --> 00:17:02,528
Dropbox app, and that watches all of
the files on your file system, and then

295
00:17:02,528 --> 00:17:07,088
it looks at what's happened and then
syncs them up to the Dropbox server.

296
00:17:07,268 --> 00:17:10,448
And then whenever changes happen on
the Dropbox server, it syncs them down.

297
00:17:11,062 --> 00:17:15,112
there's another kind of interesting
decision here on Dropbox by

298
00:17:15,112 --> 00:17:17,632
design was always like a sidecar.

299
00:17:17,632 --> 00:17:20,782
It's always something that just
sits and it looks at your files.

300
00:17:20,782 --> 00:17:23,242
Your files are just regular
files on the file system.

301
00:17:23,602 --> 00:17:28,215
And if Dropbox, the app isn't running,
your files are there and they're safe,

302
00:17:28,215 --> 00:17:32,445
and it's something that  you know,
regular apps  can just read and write

303
00:17:32,445 --> 00:17:37,795
to, and in some sense  like Dropbox
was unintentionally local-first

304
00:17:37,815 --> 00:17:39,315
from that perspective, right?

305
00:17:39,345 --> 00:17:42,465
Because it's saying that no
matter what happens, your data

306
00:17:42,465 --> 00:17:43,725
is just there and you own it.

307
00:17:44,257 --> 00:17:46,957
and you know, there are
other systems, right?

308
00:17:46,957 --> 00:17:52,597
Like if you use NFS a  like a network
file system, then if you unmount it or

309
00:17:52,597 --> 00:17:53,987
if you lose connection to the server.

310
00:17:54,657 --> 00:17:58,447
You might not be able to actually open
any files that you have the metadata for.

311
00:17:58,897 --> 00:17:59,227
Right.

312
00:17:59,227 --> 00:18:04,493
And I remember from a user perspective,
the local-first aspect, I really went

313
00:18:04,513 --> 00:18:08,083
through like all the stages where I
had a computer that wasn't connected

314
00:18:08,083 --> 00:18:11,983
to the internet yet, and that at some
point I had an internet connection.

315
00:18:12,313 --> 00:18:16,957
But, files were always where like
everything depended on files.

316
00:18:16,957 --> 00:18:20,647
Like if I didn't have a
file, things wouldn't work.

317
00:18:20,647 --> 00:18:22,127
Everything depended on files.

318
00:18:22,127 --> 00:18:26,627
There were barely websites that
where you could do meaningful things.

319
00:18:26,947 --> 00:18:30,130
certainly web apps
weren't very common yet.

320
00:18:30,640 --> 00:18:35,410
And then Dropbox made everything
seamlessly work together.

321
00:18:35,950 --> 00:18:41,140
And then when web apps and SaaS
software more came along, I was a

322
00:18:41,140 --> 00:18:43,240
bit confused because I felt Okay.

323
00:18:43,240 --> 00:18:48,639
I t gives me some collaboration, but seems
to be a different kind of collaboration

324
00:18:48,639 --> 00:18:50,499
since I had collaboration before.

325
00:18:50,889 --> 00:18:56,085
But I also understood the limitations
of, when I'm working on the same doc

326
00:18:56,085 --> 00:19:00,762
file, through Dropbox, which gets
sort of like the first copy, second

327
00:19:00,762 --> 00:19:05,729
copy, third copy, and now I need
to somehow manually reconcile that.

328
00:19:05,789 --> 00:19:08,519
And when I saw Google
Docs for the first time.

329
00:19:09,149 --> 00:19:14,609
That was really like a revelation because,
oh, now we can do this at the same time.

330
00:19:14,609 --> 00:19:19,079
But at the same while I saw that,
I still remember the feeling

331
00:19:19,079 --> 00:19:20,969
where, but where are my files?

332
00:19:20,969 --> 00:19:22,409
This is my stuff now.

333
00:19:22,409 --> 00:19:23,459
Where, where is it?

334
00:19:23,909 --> 00:19:29,899
And that trust that you've mentioned
with Dropbox, I felt like I lost some,

335
00:19:30,109 --> 00:19:35,549
some control here and it required a
lot of trust, in those tools that I

336
00:19:35,549 --> 00:19:37,529
started now step by step, embracing.

337
00:19:37,559 --> 00:19:41,279
And frankly, I think a lot of those tools
didn't deserve my trust in hindsight.

338
00:19:41,879 --> 00:19:48,254
I still feel like we've lost something
by no longer being able to like call

339
00:19:48,254 --> 00:19:50,624
the foundation our own in a way.

340
00:19:50,954 --> 00:19:54,764
And I'm still hoping that we kind of
find the best of both worlds where

341
00:19:54,764 --> 00:19:58,634
we get that seamless collaboration
that we now take for granted.

342
00:19:58,634 --> 00:20:00,344
Something like that Figma gives us.

343
00:20:00,682 --> 00:20:06,080
but also the control and just being
ready for whatever happens, that's

344
00:20:06,080 --> 00:20:08,330
something Dropbox gave us out of the box.

345
00:20:08,617 --> 00:20:12,037
I just wanna share this sort of
like anecdote and like almost

346
00:20:12,037 --> 00:20:15,817
emotional confusion as I walk
through those different stages

347
00:20:16,117 --> 00:20:17,997
of how we work with software.

348
00:20:18,837 --> 00:20:19,257
Totally.

349
00:20:19,257 --> 00:20:22,617
And we've ended up in a place
that's not great in a lot of ways.

350
00:20:22,617 --> 00:20:22,977
Right.

351
00:20:22,977 --> 00:20:27,867
And I think  you know, I think part
of the sad thing, and maybe from

352
00:20:27,897 --> 00:20:32,907
even like an operating systems design
perspective is that I feel like files

353
00:20:32,907 --> 00:20:35,007
have lots of design decisions that are.

354
00:20:35,472 --> 00:20:36,672
Packaged up together.

355
00:20:36,972 --> 00:20:39,972
You know, like one of the amazing
things about files is that

356
00:20:39,972 --> 00:20:41,322
they're self-contained, right?

357
00:20:41,502 --> 00:20:44,712
Like on Google, I don't know what
Google's backend looks like for Google

358
00:20:44,712 --> 00:20:49,322
Docs, but they probably have like all
of the metadata and pieces of the data

359
00:20:49,322 --> 00:20:53,622
spread  and different rows in a database
and different things in an object store.

360
00:20:53,922 --> 00:20:57,192
And just even thinking about like
the physical implementation of that

361
00:20:57,192 --> 00:21:00,522
data, it's like scattered around
probably a bunch of servers, right?

362
00:21:00,522 --> 00:21:01,842
Maybe in different data centers.

363
00:21:02,172 --> 00:21:05,712
And there's something really nice
about a file where a file is just

364
00:21:05,712 --> 00:21:08,292
like a piece of state, right?

365
00:21:08,292 --> 00:21:09,552
That is just self-contained.

366
00:21:09,912 --> 00:21:13,632
And I think the  thing that I think
is one of the things I think is very

367
00:21:13,632 --> 00:21:18,042
unfortunate is like from a operating
systems perspective is that that decision

368
00:21:18,042 --> 00:21:24,312
then has also been coupled with a very
anemic API like with files, they're just

369
00:21:24,342 --> 00:21:30,072
sequences of bytes that can be  read
and written to and impended and there's

370
00:21:30,072 --> 00:21:31,992
no additional structure beyond that.

371
00:21:32,532 --> 00:21:33,452
And I think like.

372
00:21:33,782 --> 00:21:37,882
Folks  the way that things have
evolved is that we've given up on, too

373
00:21:37,902 --> 00:21:41,082
have more structure, too make things
like Google Docs, too be able to

374
00:21:41,082 --> 00:21:46,032
reconcile and have collaboration and
interpret things more than just bites.

375
00:21:46,302 --> 00:21:49,092
We've also given up this ability
to package things together.

376
00:21:49,585 --> 00:21:53,873
Mac os had like a very kind of
baby step in this direction with  I

377
00:21:53,873 --> 00:21:54,803
think they're called bundles.

378
00:21:54,833 --> 00:21:57,863
Like the things where like  if
you have like your.app, they're

379
00:21:57,863 --> 00:21:59,363
actually a zip file, right?

380
00:21:59,610 --> 00:22:03,510
And there's all types of ways, all
types of brain damage for how this

381
00:22:03,510 --> 00:22:05,130
like, doesn't actually work well.

382
00:22:05,130 --> 00:22:05,670
You know?

383
00:22:05,670 --> 00:22:07,740
But the idea is kind
of interesting, right?

384
00:22:07,740 --> 00:22:10,950
It's like what if files had some
more structure and what if you still

385
00:22:10,950 --> 00:22:15,300
considered something, an atomic unit,
but then it had pieces of it that

386
00:22:15,300 --> 00:22:17,100
weren't just uninterpretable bites.

387
00:22:17,400 --> 00:22:20,705
And I think that's like, the
path dependent, way that we've

388
00:22:20,705 --> 00:22:21,665
ended up where we are today.

389
00:22:22,295 --> 00:22:23,075
That makes sense.

390
00:22:23,165 --> 00:22:28,525
So going back to the sync engine
implementation  did the Python process

391
00:22:28,525 --> 00:22:34,195
back in the day  did that mostly index
all of the files and then actually

392
00:22:34,218 --> 00:22:39,394
send across the actual bites probably
in some chunks, across the wire?

393
00:22:39,394 --> 00:22:45,468
Or was there some more intelligent
and diffing happening client side

394
00:22:45,588 --> 00:22:50,601
that you would only send kind of the
changes across the wire and how do I

395
00:22:50,601 --> 00:22:55,341
need to think about what is a change
when I'm dealing with like a ton of

396
00:22:55,341 --> 00:22:57,561
bites before and a ton of bites after?

397
00:22:57,924 --> 00:22:58,074
Yeah.

398
00:22:58,074 --> 00:22:59,364
It's really, really good questions.

399
00:22:59,364 --> 00:23:03,784
I think  maybe like the first
starting point is that like files

400
00:23:03,784 --> 00:23:07,464
in Dropbox were stored, just broken
up into four megabyte chunks.

401
00:23:07,734 --> 00:23:10,764
And that was just a decision at the
very beginning to pick some size.

402
00:23:11,394 --> 00:23:15,384
And on the server, the way that those
chunks were stored is that they,

403
00:23:15,414 --> 00:23:20,364
each four megabyte chunk was stored
by key to by its shot to 56 hash.

404
00:23:20,764 --> 00:23:22,834
So we would assume that
those are globally unique.

405
00:23:23,074 --> 00:23:27,514
So then if you had the same copy
of a bunch of file, or you had

406
00:23:27,514 --> 00:23:30,964
a file copied many times in your
Dropbox, we would only store it once.

407
00:23:31,504 --> 00:23:34,654
And that would just happen
organically because we would say

408
00:23:34,654 --> 00:23:38,974
like, okay, I looked at this file,
it has three chunks  A, B, and C.

409
00:23:39,364 --> 00:23:42,964
And then the client would ask the
server, do you have A, B, and C?

410
00:23:43,294 --> 00:23:47,734
Like the server would say, yes, I have
B and C already, please send A, then we

411
00:23:47,734 --> 00:23:52,098
would upload A. so there was already like
at the file level there was this  like

412
00:23:52,098 --> 00:23:55,728
kind of very coarse grained Delta sync.

413
00:23:56,071 --> 00:23:57,708
at the four megabyte chunk layer.

414
00:23:58,231 --> 00:24:01,748
and then the kind of, it's funny,
these things evolve, right?

415
00:24:01,748 --> 00:24:05,228
Like then the next thing we layered on
up top was that in that setting where

416
00:24:05,228 --> 00:24:09,398
you decided B and C were there already
and you needed to upload a then with

417
00:24:09,428 --> 00:24:15,308
a, the desktop client could use rsync
to know that there was previously  a

418
00:24:15,308 --> 00:24:19,568
prime and do a patch between the two
and then send just those contents.

419
00:24:19,918 --> 00:24:23,578
the kind of thing that was pretty
interesting is that  a lot of the content

420
00:24:23,578 --> 00:24:29,588
on Dropbox was very incompressible
stuff like video, images, so the

421
00:24:29,784 --> 00:24:34,314
benefits of deduplication both
across users or even within a user.

422
00:24:34,524 --> 00:24:39,984
And the benefit of like rsync was not
actually as much as one might think,

423
00:24:40,434 --> 00:24:43,824
at least from the like, terms of
bandwidth going through the system.

424
00:24:43,854 --> 00:24:47,534
It wasn't that reductive because a lot of
this content was just kind of unique and

425
00:24:48,104 --> 00:24:50,364
not  getting updated in small patches.

426
00:24:51,429 --> 00:24:56,559
And on your server side, blob store, now
that you had those hashes for those four

427
00:24:56,559 --> 00:25:02,619
megabyte chunks, that also means that you
could probably deduplicate some content

428
00:25:02,679 --> 00:25:08,979
across users, which makes me think of
all sorts of other implications of that.

429
00:25:09,069 --> 00:25:12,369
When do you know it's
safe to let go of a junk?

430
00:25:12,736 --> 00:25:16,876
do you also now know that, you
could kind of go backwards and

431
00:25:16,876 --> 00:25:20,506
say like, oh, from this hash, we
know this is sensitive content.

432
00:25:20,986 --> 00:25:25,993
And  have some further implications
for, whatever we don't need to go too

433
00:25:25,993 --> 00:25:28,659
much into depth on that now, but, yeah.

434
00:25:28,659 --> 00:25:32,259
I'm curious like how you thought
of those design decisions and

435
00:25:32,259 --> 00:25:33,549
the possible implications.

436
00:25:34,119 --> 00:25:34,479
Yeah.

437
00:25:34,539 --> 00:25:38,289
Yeah, for the first one  yeah,
like distributed garbage collection

438
00:25:38,289 --> 00:25:39,759
was a very hard problem for us.

439
00:25:39,819 --> 00:25:44,349
We called it vacuuming and  in terms
of making Dropbox economics work out

440
00:25:44,349 --> 00:25:48,963
of, like, when we couldn't afford to
keep a lot of  content that was deleted

441
00:25:48,963 --> 00:25:50,283
that we couldn't charge users for.

442
00:25:50,583 --> 00:25:54,453
So that was  you know, there's all
additional complexity where different

443
00:25:54,453 --> 00:25:58,389
users would have  like the ability to
restore for different periods of time.

444
00:25:58,509 --> 00:26:01,689
So we would say like, anything
that's deleted, it doesn't actually

445
00:26:01,689 --> 00:26:05,199
get deleted for 30 days or a year
or whatnot based on their plan.

446
00:26:05,583 --> 00:26:09,813
so then, yeah, like  having to do
this like big distributed mark and

447
00:26:09,813 --> 00:26:14,043
sweep garbage collection algorithm
across  hundreds of petabytes,

448
00:26:14,043 --> 00:26:18,243
exabytes of content  that was something
that we had to get pretty good at.

449
00:26:18,243 --> 00:26:23,006
And when we designed Magic Pocket,
where we, implemented S3 in-house, we

450
00:26:23,006 --> 00:26:28,226
had specific primitives for making it a
little bit easier to avoid race conditions

451
00:26:28,226 --> 00:26:31,016
where like, if a file was deleted.

452
00:26:31,961 --> 00:26:34,601
And we decided that no
one needed it anymore.

453
00:26:34,631 --> 00:26:38,241
But then just at that point in time,
someone uploads it again, making sure

454
00:26:38,241 --> 00:26:40,421
that  we don't accidentally delete it.

455
00:26:40,781 --> 00:26:43,481
So that was like, yeah,
definitely a very tricky problem.

456
00:26:43,531 --> 00:26:48,614
And I think  in retrospect this is like
an interesting design  exercise, right?

457
00:26:48,614 --> 00:26:52,784
And that if deduplication wasn't actually
that valuable for us, we could have

458
00:26:52,934 --> 00:26:57,464
eliminated a lot of complexity for this
garbage collection by not doing it right.

459
00:26:58,001 --> 00:26:59,671
I think for the second thing, yeah.

460
00:26:59,671 --> 00:27:06,731
So  at the beginning when Dropbox started,
if you had a file with A, B and C and you

461
00:27:06,731 --> 00:27:10,831
uploaded it, it would just check, does
A, B and C exist anywhere in Dropbox?

462
00:27:11,351 --> 00:27:16,958
And, that got changed over time
to be does  do you as your user

463
00:27:17,348 --> 00:27:18,948
have access to A, B, and C?

464
00:27:19,448 --> 00:27:24,143
And you know, 'cause otherwise you could
use this for all types of purposes, right?

465
00:27:24,143 --> 00:27:27,583
To see if there exists some
content anywhere in Dropbox.

466
00:27:27,613 --> 00:27:32,573
And, that was something where we
would  in the case where the user was

467
00:27:32,573 --> 00:27:38,033
uploading A, B, and C, say none of them
were present in their account, we would

468
00:27:38,033 --> 00:27:42,833
actually force them to upload it, incur
the bandwidth for doing so, and then

469
00:27:42,833 --> 00:27:45,173
discard it if B and C existed elsewhere.

470
00:27:46,085 --> 00:27:46,345
Yeah.

471
00:27:46,345 --> 00:27:47,091
Very interesting.

472
00:27:47,121 --> 00:27:50,878
I mean, this would be an interesting
rabbit hole just to go down just the

473
00:27:50,878 --> 00:27:54,658
kind of second order effects of that
design decision, particularly at

474
00:27:54,658 --> 00:27:56,783
the scale and importance of Dropbox.

475
00:27:57,083 --> 00:27:59,213
But maybe we save that for another time.

476
00:27:59,513 --> 00:28:04,359
So going back to the sync engine, now that
we have a better understanding of, how it

477
00:28:04,359 --> 00:28:06,999
worked in that shape and form back then.

478
00:28:07,449 --> 00:28:12,219
You've been already mentioning before,
like as things as usage went through

479
00:28:12,219 --> 00:28:16,813
the roof, all sorts of different
usage scenarios also expanded.

480
00:28:17,268 --> 00:28:22,749
you had all sorts of more esoteric
ways, how you didn't kind of even think

481
00:28:22,809 --> 00:28:25,209
before that it would be used this way.

482
00:28:25,239 --> 00:28:27,369
Now all of that came to light.

483
00:28:28,099 --> 00:28:33,216
I'm curious which sort of, helper
systems you put in place that you could

484
00:28:33,216 --> 00:28:39,446
even have a grasp of what's going on
since a part of the trust that Dropbox

485
00:28:39,586 --> 00:28:44,476
owned or that earned over time, was
probably also related to privacy.

486
00:28:44,716 --> 00:28:49,126
So you, you couldn't just like read
everything that's going on in someone's

487
00:28:49,126 --> 00:28:54,766
system, so you're probably also relying
to some degree on the help of a user

488
00:28:55,036 --> 00:28:57,076
that they like send something over.

489
00:28:57,076 --> 00:28:57,406
Yeah.

490
00:28:57,436 --> 00:29:02,716
Walk me through like the evolution
of that and that you, like as

491
00:29:02,716 --> 00:29:06,376
an engineer, if there's a bug
reproducing that bug is everything.

492
00:29:07,006 --> 00:29:09,316
So walk me through that process.

493
00:29:09,766 --> 00:29:13,306
Yeah, and you know, like we had a very
strict rule, right, where it just,

494
00:29:13,366 --> 00:29:15,316
we do not look at content, right?

495
00:29:15,773 --> 00:29:20,323
and so that was  the thing when
debugging issues, the saving grace is

496
00:29:20,323 --> 00:29:22,573
that for most of the issues we saw.

497
00:29:22,923 --> 00:29:28,003
They were more metadata issues around
like sync, not converging or sync, getting

498
00:29:28,003 --> 00:29:32,383
to the client thinking it's in sync
with the server, but them disagreeing.

499
00:29:32,691 --> 00:29:35,799
so we had a few pretty,
yeah, like pretty interesting

500
00:29:35,799 --> 00:29:37,539
supporting algorithms for this.

501
00:29:37,569 --> 00:29:41,769
So one of them was  just simple like
hang detection, like making sure, like

502
00:29:41,949 --> 00:29:45,249
if, when should a client reasonably
expect that they are in sync?

503
00:29:45,869 --> 00:29:49,439
And if they're online and if
they've downloaded all the recent

504
00:29:49,439 --> 00:29:53,189
versions and things are getting
stuck, why are they getting stuck?

505
00:29:53,189 --> 00:29:55,649
So are they getting stuck because
they can't read stuff from the

506
00:29:55,649 --> 00:29:57,749
server, either metadata or data?

507
00:29:57,959 --> 00:30:00,509
Are they getting stuck because they
can't write to the file system and

508
00:30:00,509 --> 00:30:01,819
there's some permission errors?

509
00:30:02,079 --> 00:30:06,683
So  I think having  very fine-grained
classification of that and having the

510
00:30:06,683 --> 00:30:11,653
client do that in a way that's like  not
including any private information  and

511
00:30:11,653 --> 00:30:14,753
sending that up for reports and then
aggregating that over all of the clients

512
00:30:14,753 --> 00:30:19,643
and being able to classify  was a big part
of us being able to get a handle on it.

513
00:30:20,059 --> 00:30:23,699
And I think this is just generally
very useful for these sync engines.

514
00:30:23,996 --> 00:30:27,056
the biggest return on investment we
got was from consistency checkers.

515
00:30:27,676 --> 00:30:32,949
So  part of sync is that there's the same
data duplicated in many places, right?

516
00:30:33,219 --> 00:30:36,849
Like, so we had the data that's
on the user's local file system.

517
00:30:37,179 --> 00:30:41,199
We had all of the metadata that we stored
in SQLite or we would store like what

518
00:30:41,199 --> 00:30:42,939
we think should be on the file system.

519
00:30:43,689 --> 00:30:46,569
We would store what the latest
view from the server was.

520
00:30:46,569 --> 00:30:49,509
We would store things that were
in progress, and then we have

521
00:30:49,509 --> 00:30:50,589
what's stored on the server.

522
00:30:50,799 --> 00:30:55,269
And for each one of those like hops, we
would have a consistency checker that

523
00:30:55,269 --> 00:30:57,639
would go and see if those two matched.

524
00:30:57,969 --> 00:31:02,139
And those would, that was like the
highest return on investment we got.

525
00:31:02,139 --> 00:31:05,649
Because before we had that, people
would write in and they would

526
00:31:05,649 --> 00:31:07,179
complain that Dropbox wasn't working.

527
00:31:07,779 --> 00:31:10,509
And until we had these consistency
checkers, we had no idea the

528
00:31:10,509 --> 00:31:13,419
order of magnitude of how
many issues were happening.

529
00:31:13,869 --> 00:31:16,029
And when we started doing
it, we're like, wow.

530
00:31:16,599 --> 00:31:17,379
There's actually a lot.

531
00:31:18,026 --> 00:31:22,886
So a consistency check in this regard
was mostly like a hash over some

532
00:31:22,886 --> 00:31:24,506
packets that you're sending around.

533
00:31:24,866 --> 00:31:30,326
And with that you could verify, okay, up
until like from A to B to C to D, we're

534
00:31:30,326 --> 00:31:35,816
all seeing the same hash, but suddenly
on the hop from D to E, the hash changes.

535
00:31:35,876 --> 00:31:36,266
Ah-huh.

536
00:31:36,296 --> 00:31:37,196
Let's investigate.

537
00:31:37,736 --> 00:31:38,396
Exactly.

538
00:31:38,726 --> 00:31:42,926
And so, and to do that in a way
that's  respectful of the users,

539
00:31:42,986 --> 00:31:45,356
even like  resources on their system.

540
00:31:45,356 --> 00:31:50,006
Like we wouldn't just go and blast their
CPU and their disc and their network to go

541
00:31:50,006 --> 00:31:51,836
and like turn through a bunch of things.

542
00:31:51,836 --> 00:31:54,896
So we would have like a sampling
process where we like sample a random

543
00:31:54,896 --> 00:31:58,166
path in the tree and the client
and do it the same on the server.

544
00:31:58,463 --> 00:32:02,333
we would have stuff  with like Merkle
trees and then when things would diverge,

545
00:32:02,333 --> 00:32:07,643
we would try to see like, is there a way
we can compare on the client and see  like

546
00:32:07,643 --> 00:32:12,004
for example one of the kind of really
important, goals for us as an operational

547
00:32:12,004 --> 00:32:14,494
team was to have like the power of zero.

548
00:32:14,764 --> 00:32:17,464
I think it might be from AWS or something.

549
00:32:17,464 --> 00:32:19,294
My co-founder James, has
a really good talk on it.

550
00:32:19,764 --> 00:32:25,704
but we would want to have a metric of
saying that the number of unexplained

551
00:32:25,764 --> 00:32:28,790
inconsistencies is zero and one 'cause.

552
00:32:28,790 --> 00:32:31,730
Then the nice thing right, is that
if it's a zero and it regresses,

553
00:32:31,730 --> 00:32:33,080
you know that it's a regression.

554
00:32:33,350 --> 00:32:38,780
If it's at like fluctuating at like 15
or like a hundred thousand and it kind

555
00:32:38,780 --> 00:32:42,530
of goes up by 5%, it's very hard to know
when evaluating a new release, right?

556
00:32:42,530 --> 00:32:44,390
That like that's actually safe or not.

557
00:32:44,824 --> 00:32:49,204
so then that would mean that whenever we
would have an inconsistency due to a bit

558
00:32:49,204 --> 00:32:55,234
flip, which we would see all the time
on client devices, then we would have to

559
00:32:55,444 --> 00:32:57,454
categorize that and then bucket that out.

560
00:32:57,604 --> 00:32:58,804
So we would have a baseline.

561
00:32:59,659 --> 00:33:03,319
Expectation of how many bit flips there
are across all of the devices on Dropbox.

562
00:33:03,679 --> 00:33:06,589
And we would see that that's
staying consistent or increasing or

563
00:33:06,589 --> 00:33:09,829
decreasing, and that the number of
unexplained things was still at zero.

564
00:33:10,215 --> 00:33:12,885
now let's take those detours
since you got me curious.

565
00:33:13,125 --> 00:33:16,065
Uh, what would cause bit
flips on a local device?

566
00:33:16,602 --> 00:33:20,982
I think a few, few causes, one of them
is just that in the data center, most

567
00:33:20,982 --> 00:33:24,822
memory uses error correction and you have
to pay more for it, usually have to pay

568
00:33:24,822 --> 00:33:26,472
more for a motherboard that supports it.

569
00:33:26,862 --> 00:33:27,672
at least back then.

570
00:33:27,736 --> 00:33:30,532
now like  on client
devices we don't have that.

571
00:33:30,602 --> 00:33:34,302
So  this is a little bit above
my pay grade for hardware  cosmic

572
00:33:34,302 --> 00:33:36,632
rays or thermal noise or whatever.

573
00:33:36,632 --> 00:33:40,002
But  memory is much more
resilient in the data center.

574
00:33:40,315 --> 00:33:44,355
I think another is just that, storage
devices are very greatly in quality.

575
00:33:44,415 --> 00:33:49,335
Like your SSDs and your hard drives  are
much higher quality inside the data

576
00:33:49,335 --> 00:33:51,495
center than they are on local devices.

577
00:33:51,855 --> 00:33:52,515
And so.

578
00:33:53,160 --> 00:33:54,150
You know, there's that.

579
00:33:54,447 --> 00:33:57,297
it also could be like  I had
mentioned that people have all

580
00:33:57,297 --> 00:33:58,797
types of weird configurations.

581
00:33:59,097 --> 00:34:03,387
Like on Mac there are all these
kernel extensions on Windows, there's

582
00:34:03,387 --> 00:34:05,007
all of these mini filter drivers.

583
00:34:05,007 --> 00:34:07,437
There are all these things
that are interposing between

584
00:34:07,827 --> 00:34:11,127
Dropbox, the user space process
and writing to the file system.

585
00:34:11,427 --> 00:34:15,297
And if those have any memory safety
issues where they're corrupting memory

586
00:34:15,387 --> 00:34:19,434
'cause of the written in archaic C
you know, or something that that's

587
00:34:19,454 --> 00:34:20,654
the way things can get corrupted.

588
00:34:20,834 --> 00:34:22,244
I mean, we've seen all types of things.

589
00:34:22,244 --> 00:34:26,709
We've seen  network routers get
having corrupting data, but usually

590
00:34:26,924 --> 00:34:28,394
that fails some checksum, right?

591
00:34:28,424 --> 00:34:33,464
Or we've seen  even registers on CPUs
being bad where the memory gets replaced

592
00:34:33,614 --> 00:34:38,114
and the memory seems like it's fine, but
then it just turns out the CPU has its

593
00:34:38,114 --> 00:34:40,214
own registers on CHIP that are busted.

594
00:34:40,214 --> 00:34:44,204
And so all of that stuff  I
think just can happen at scale.

595
00:34:44,234 --> 00:34:44,624
Right.

596
00:34:45,050 --> 00:34:45,770
that makes sense.

597
00:34:45,770 --> 00:34:51,774
And I'm happy to say that I've hadn't
had yet to worry about flip bits, whether

598
00:34:51,774 --> 00:34:56,824
it's being for storage or other things,
but huge respect to whoever had already

599
00:34:56,824 --> 00:34:59,591
to, tame those parts of the system.

600
00:34:59,951 --> 00:35:05,444
So, you mentioning the consistency check
as probably the biggest lever that you

601
00:35:05,444 --> 00:35:11,324
had to understand which health stage
your sync engine is in the first place.

602
00:35:11,698 --> 00:35:18,928
was this the only  kind of metric and
proxy for understanding with how well

603
00:35:18,928 --> 00:35:22,618
the syn system is working or were
there some other aspects that gave

604
00:35:22,618 --> 00:35:25,618
you visibility both macro and micro?

605
00:35:26,071 --> 00:35:30,511
Yeah, I mean, I think this  yeah,
the kind of hangs, so like knowing

606
00:35:30,511 --> 00:35:33,991
that something gets to a sync state
and knowing the duration, right?

607
00:35:33,991 --> 00:35:38,514
So the kind of  performance of that
was one of our top line metrics.

608
00:35:38,514 --> 00:35:40,474
And the other one was
this consistency check.

609
00:35:40,814 --> 00:35:43,524
And then  first specific
like operations, right?

610
00:35:43,524 --> 00:35:47,374
Like uploading a file, like how much
bandwidth are people able to use

611
00:35:47,624 --> 00:35:53,124
because  for like, people wanted to
use Dropbox, but, and upload lots,

612
00:35:53,124 --> 00:35:57,324
like huge data, like huge number of
files where each file is really large.

613
00:35:57,594 --> 00:36:01,584
And then they might do it on  in
Australia or Japan where they're

614
00:36:01,944 --> 00:36:03,234
far away from a data center.

615
00:36:03,234 --> 00:36:06,774
So latency is high, but bandwidth
is very high too, right?

616
00:36:06,774 --> 00:36:09,914
So making sure that we could
fully saturate their pipes and all

617
00:36:09,914 --> 00:36:12,114
types of  stuff with debugging.

618
00:36:12,724 --> 00:36:13,654
Things in the internet, right?

619
00:36:13,654 --> 00:36:16,774
People having really bad
routes to AWS and all that.

620
00:36:16,974 --> 00:36:18,324
so we would track things like that.

621
00:36:18,568 --> 00:36:20,968
I think other than that it was
mostly just the usual quality stuff,

622
00:36:20,968 --> 00:36:25,298
like just exceptions and  making
sure that features all work.

623
00:36:25,388 --> 00:36:30,154
I think  when we rewrote this system
and we, designed it to be very correct.

624
00:36:30,274 --> 00:36:34,404
We moved a lot of these things into
testing before we would release.

625
00:36:35,024 --> 00:36:38,734
So we  this is I think one of the, to
jump ahead a little bit, we designed,

626
00:36:38,794 --> 00:36:44,974
decided to rewrite Dropbox's sync engine
from this big Python code base into Rust.

627
00:36:45,304 --> 00:36:49,294
And one of the specific design decisions
was to make things extremely testable.

628
00:36:49,729 --> 00:36:53,239
So we would have everything be
deterministic on a single thread,

629
00:36:53,509 --> 00:36:56,989
have all of the reads and rights
to the network and file system,

630
00:36:56,989 --> 00:36:59,119
be, through a virtualized API.

631
00:36:59,416 --> 00:37:03,616
So then we could run all of these
simulations of exploring what would

632
00:37:03,616 --> 00:37:08,026
happen if you uploaded a file here and
deleted it concurrently and then had a

633
00:37:08,026 --> 00:37:09,976
network issue that forced you to retry.

634
00:37:10,306 --> 00:37:14,716
And so by simulating all of those in
ci, we would be able to then have very

635
00:37:14,716 --> 00:37:18,466
strong in variance about them that
knowing that like a file should never

636
00:37:18,466 --> 00:37:21,796
get deleted in this case, or that
it should always converge, or things

637
00:37:21,796 --> 00:37:26,326
like the sharing that this file should
never get exposed to this other viewer.

638
00:37:26,904 --> 00:37:31,043
I think like the, having much, like
having stronger guarantees was something

639
00:37:31,043 --> 00:37:36,443
that we only could really do effectively
once we designed the system to make

640
00:37:36,443 --> 00:37:38,093
it easy to test those guarantees.

641
00:37:38,828 --> 00:37:39,188
Right.

642
00:37:39,188 --> 00:37:40,268
That makes a lot of sense.

643
00:37:40,268 --> 00:37:43,568
And I think we're seeing more
and more systems, also in the

644
00:37:43,568 --> 00:37:45,704
database world, embrace this.

645
00:37:45,704 --> 00:37:49,012
I think TigerBeetle is,
is quite popular for that.

646
00:37:49,394 --> 00:37:53,828
I think the folks at Torso are
now also embracing this approach.

647
00:37:54,102 --> 00:37:56,772
I think it goes under the
umbrella of simulation testing.

648
00:37:57,218 --> 00:37:58,448
that sounds very interesting.

649
00:37:58,448 --> 00:38:03,788
Can you explain a little bit more how
maybe in a much smaller program would

650
00:38:03,788 --> 00:38:08,318
this basically be Just that every
assumption and any potential branch,

651
00:38:08,348 --> 00:38:13,958
any sort of side effect thing that might
impact the execution of my program.

652
00:38:13,958 --> 00:38:19,868
Now I need to make explicit and it's
almost like a parameter that I put into

653
00:38:19,868 --> 00:38:25,735
the arguments of my functions and now I
call it under these circumstances, and I

654
00:38:25,735 --> 00:38:31,375
can therefore simulate, oh, if that file
suddenly gives me an unexpected error.

655
00:38:31,675 --> 00:38:33,385
Then this is how we're gonna handle it.

656
00:38:33,865 --> 00:38:34,795
Yeah, exactly.

657
00:38:34,795 --> 00:38:38,845
So it's like  and there's techniques
that  like the TigerBeetle folks, like

658
00:38:38,845 --> 00:38:42,745
we, we do this at Convex in rust with the
right, like abstractions, there's like

659
00:38:42,745 --> 00:38:45,235
techniques to make it not so awkward.

660
00:38:45,235 --> 00:38:50,815
But yeah, it is like this idea of like,
can you pin all of the non-determinism in

661
00:38:50,815 --> 00:38:54,895
the system can, whether it's like reading
from a random number generator, whether

662
00:38:54,895 --> 00:38:58,765
it's looking at time, whether it's reading
and writing to files or the network.

663
00:38:58,945 --> 00:39:04,425
Can that all be  like  pulled out so
that in, production it's just using the

664
00:39:04,425 --> 00:39:06,865
random AP or the  regular APIs for it.

665
00:39:07,258 --> 00:39:10,558
so there's like  for any of these
sync engines, there's a core

666
00:39:10,558 --> 00:39:13,318
of the system which represents
all the sync rules, right?

667
00:39:13,318 --> 00:39:16,198
Like when I get a new file
from the server, what do I do?

668
00:39:16,528 --> 00:39:19,468
You know, if there's a concurrent
edit to this, what do I do?

669
00:39:19,748 --> 00:39:23,953
and that I. Core of the code is often
the part that has the most bugs, right?

670
00:39:23,953 --> 00:39:27,403
It has the, it doesn't think about
some of the corner cases or if

671
00:39:27,403 --> 00:39:30,853
there are errors or needs retries
or doesn't handle concurrency.

672
00:39:30,853 --> 00:39:32,053
It might have race conditions.

673
00:39:32,323 --> 00:39:36,883
So the kind of, I think the core idea
for determination, determin deterministic

674
00:39:36,883 --> 00:39:43,033
simulation testing is to take that core
and just kind of like pull out all of the

675
00:39:43,033 --> 00:39:45,283
non-determinism from it into an interface.

676
00:39:45,403 --> 00:39:49,213
So time randomness, reading and
writing to the network, reading

677
00:39:49,213 --> 00:39:52,753
and writing to the file system, and
making it so that in production,

678
00:39:52,933 --> 00:39:54,703
those are just using the regular APIs.

679
00:39:55,033 --> 00:39:58,873
But in a testing situation,
those can be using mocks.

680
00:39:59,023 --> 00:40:02,383
Like they could be using things
that  for a particular test

681
00:40:02,383 --> 00:40:06,253
and wants to test a scenario or
setting it up in a specific way.

682
00:40:06,673 --> 00:40:09,223
Or it could be randomized, right?

683
00:40:09,223 --> 00:40:14,543
Where it might be that  reading from
Like time, the test framework might

684
00:40:14,603 --> 00:40:18,923
decide pseudo randomly to advance it
or to keep it at the current time or

685
00:40:18,923 --> 00:40:20,873
might serialize things differently.

686
00:40:21,143 --> 00:40:27,293
And that type of ability to have random
search explore the state space of

687
00:40:27,353 --> 00:40:30,833
all the things that are possible is
just one of those like unreasonably

688
00:40:30,833 --> 00:40:32,813
effective ideas, I think for testing.

689
00:40:33,203 --> 00:40:37,373
And then that like getting a
system to pass that type of

690
00:40:37,373 --> 00:40:38,963
deterministic simulation testing.

691
00:40:39,503 --> 00:40:42,893
It's not at the threshold of having
formal verification, but in our

692
00:40:42,893 --> 00:40:47,457
experience it's pretty close and with
a much, much, smaller amount of work.

693
00:40:48,117 --> 00:40:50,427
And you mentioning
Haskell at the beginning?

694
00:40:50,457 --> 00:40:55,467
I still remember when I, after a a lot of
time having spent writing unit tests in

695
00:40:55,517 --> 00:41:00,017
JavaScript and I, back then, in the other
order, I first had JavaScript and then I

696
00:41:00,017 --> 00:41:04,817
learned Haskell, and then I found quick
test and was quick test, Quick Check.

697
00:41:05,183 --> 00:41:06,113
which one was it?

698
00:41:06,563 --> 00:41:07,493
I think it was Quick check, right?

699
00:41:07,873 --> 00:41:08,383
Well, right.

700
00:41:08,383 --> 00:41:13,424
So I found Quick Check and I could express
sort of like, Hey, this is this type.

701
00:41:13,664 --> 00:41:18,614
It has sort of those aspects to it,
those invariants and then would just

702
00:41:18,614 --> 00:41:20,534
go along and test all of those things.

703
00:41:20,534 --> 00:41:23,564
Like, wait, I never thought
of that, but of course, yes.

704
00:41:23,864 --> 00:41:27,824
And then you combine those and you
would get way too lazy to write unit

705
00:41:27,824 --> 00:41:32,354
tests for the combinatorial explosion
of like all of your different things.

706
00:41:32,354 --> 00:41:36,494
And then you can say, sample it
like that, and like, focus on this.

707
00:41:36,778 --> 00:41:40,958
and so I actually also, started
embracing this practice a lot more in the

708
00:41:40,958 --> 00:41:45,488
TypeScript work that I'm doing through
a great project called Prop Check.

709
00:41:45,994 --> 00:41:52,069
and that is, picking up the same
ideas and for particularly those

710
00:41:52,069 --> 00:41:56,509
sort of scenarios where, okay,
Murphy's Law will come and haunt you.

711
00:41:56,969 --> 00:41:58,829
this is in distributed systems.

712
00:41:58,829 --> 00:42:00,509
That is typically the case.

713
00:42:00,796 --> 00:42:05,623
Building things in such a way where
all the aspects can be, specifically

714
00:42:05,623 --> 00:42:07,873
injected and the, the sweet spot.

715
00:42:07,873 --> 00:42:12,043
If you can do so still in an ergonomic
way, I think that's the way to go.

716
00:42:13,063 --> 00:42:15,373
It's so, so valuable, right?

717
00:42:15,373 --> 00:42:15,643
And yeah.

718
00:42:15,643 --> 00:42:20,323
And yeah, the ability to, for prop tasks,
for quick check for all of these to

719
00:42:20,323 --> 00:42:23,113
also minimize is just magical, right?

720
00:42:23,113 --> 00:42:27,023
Like it comes up with this crazy
counter example and it might be

721
00:42:27,143 --> 00:42:31,693
like a list with 700 elements, but
then is able to shrink it down to

722
00:42:31,693 --> 00:42:33,613
the, like, real core of the bug.

723
00:42:33,913 --> 00:42:35,483
It's magic, right?

724
00:42:35,803 --> 00:42:38,038
And you know, I mean, I think
this is something like, you know.

725
00:42:38,653 --> 00:42:40,453
A totally different theme, right?

726
00:42:40,453 --> 00:42:44,353
Like one thing at Convex we're exploring
a lot is like  coding has changed a lot

727
00:42:44,353 --> 00:42:46,423
in the past year with AI coding tools.

728
00:42:46,693 --> 00:42:50,413
And one of the things we've observed
for getting coding tools to work very

729
00:42:50,413 --> 00:42:54,763
well with Convex is that these types
of like very succinct tests that can

730
00:42:54,763 --> 00:42:59,863
be generated easily and have like a
really high strength to weight or power

731
00:42:59,863 --> 00:43:03,449
to weight ratio  are just really good
for like autonomous coding, right?

732
00:43:03,449 --> 00:43:06,629
Like, if you are gonna take like
cursor agent and let it go wild,

733
00:43:06,839 --> 00:43:10,499
like what does it take to just let it
operate without you doing anything?

734
00:43:10,589 --> 00:43:13,229
It takes something like a prop test
because then it can just continuously

735
00:43:13,229 --> 00:43:18,149
make changes, run the test, and not know
that it's done until that test passes.

736
00:43:18,846 --> 00:43:20,316
Yeah, that makes a lot of sense.

737
00:43:20,316 --> 00:43:25,356
So let's go back for a moment to the
point where you were just transitioning

738
00:43:25,686 --> 00:43:32,016
from the previous Python based sync
engine to the Rust based sync engine.

739
00:43:32,016 --> 00:43:36,963
So you're embracing simulation
testing to have a better sense of

740
00:43:36,963 --> 00:43:41,253
like all the different aspects that
might influence the outcome here.

741
00:43:41,579 --> 00:43:44,289
walk me through like how you, went about.

742
00:43:44,559 --> 00:43:46,479
Deploying that new system.

743
00:43:46,659 --> 00:43:52,119
Were there any sort of big headaches
associated with migrating from the

744
00:43:52,119 --> 00:43:54,249
previous system to the new system?

745
00:43:54,549 --> 00:43:57,849
since you, for everything, you
had sort of a defacto source

746
00:43:57,849 --> 00:43:59,979
of truth, which are the files.

747
00:43:59,979 --> 00:44:04,659
So could you maybe just forget everything
the old system has done and you just

748
00:44:04,659 --> 00:44:09,646
treat it as like, oh, the, user would've
just installed this fresh, walk me

749
00:44:09,646 --> 00:44:14,056
through like how you thought about
that since migrating systems on such

750
00:44:14,056 --> 00:44:16,970
a big scale is typically, quite dread

751
00:44:17,340 --> 00:44:19,495
Yeah, dreadsome is, yeah.

752
00:44:19,575 --> 00:44:20,415
appropriate word.

753
00:44:20,720 --> 00:44:26,585
I think one of the biggest challenges was
that  by design we had a very different

754
00:44:26,675 --> 00:44:29,765
data model for the old sync engine.

755
00:44:29,765 --> 00:44:31,135
We called it sync engine Classic.

756
00:44:31,465 --> 00:44:32,085
Affectionately.

757
00:44:32,225 --> 00:44:34,505
And then we had for Nucleus was a new one.

758
00:44:34,745 --> 00:44:39,695
Nucleus had a very different data model,
and the motivation for that was that

759
00:44:40,535 --> 00:44:46,145
sync engine Classic just had a ton of
possible states that were illegitimate.

760
00:44:46,505 --> 00:44:50,855
It could, if you had like a, the server
update a file and the client update

761
00:44:50,855 --> 00:44:54,665
a file, but then a shared folder gets
mounted above it, things could get

762
00:44:54,665 --> 00:45:00,005
into all of these really weird states
that were legal but would cause bugs.

763
00:45:00,395 --> 00:45:04,595
And then I think that was like one
of the big guiding principles more

764
00:45:04,595 --> 00:45:09,335
than even just like Rust or Python,
was just like designing what states

765
00:45:09,335 --> 00:45:14,795
should the system be allowed to be
in and design away everything else,

766
00:45:14,795 --> 00:45:16,955
make illegal states unrepresentable.

767
00:45:17,555 --> 00:45:21,215
And so that, what that then
meant is once we had that.

768
00:45:21,515 --> 00:45:26,225
When we needed to migrate, we had a long
tail of really weird starting positions.

769
00:45:27,855 --> 00:45:33,065
So where you basically realized, okay,
this system is in this state A, how the

770
00:45:33,065 --> 00:45:35,195
heck did it ever get into that state?

771
00:45:35,255 --> 00:45:40,175
And B, what are we gonna do about
it now where we can basically,

772
00:45:40,175 --> 00:45:43,145
it's like from a mapping function,
this is like invalid input.

773
00:45:44,105 --> 00:45:49,862
So can you explain a little bit of like,
how you constrained the space of, and how

774
00:45:49,862 --> 00:45:56,075
you designed the space of, legitimate,
valid states and what were some of the,

775
00:45:56,075 --> 00:46:00,665
if you think about this as like a big
matrix of combinations, what are some

776
00:46:00,665 --> 00:46:06,165
of the more intuitive ones that were,
not allowed that you saw quite a bit?

777
00:46:06,975 --> 00:46:13,005
Yeah, so I think part of the difficulty
for Dropbox, like as syncing things

778
00:46:13,005 --> 00:46:17,085
from the file system is that file
system APIs are really anemic.

779
00:46:17,400 --> 00:46:19,980
File system aPIs don't have transactions.

780
00:46:19,980 --> 00:46:23,010
They don't  things can get
reordered in all types of ways.

781
00:46:23,190 --> 00:46:26,370
So we would just read and write to
files from the local file system, and

782
00:46:26,370 --> 00:46:30,450
we would use file system events on
Mac, we would use  the equivalent on

783
00:46:30,450 --> 00:46:32,773
Windows and Linux to get, updates.

784
00:46:32,983 --> 00:46:36,403
But everything can be reordered
and racy and everything.

785
00:46:36,493 --> 00:46:40,990
So one, like common invariant
would be that  if you have a

786
00:46:40,990 --> 00:46:44,497
directory you know, like files
have to exist within directories.

787
00:46:44,767 --> 00:46:47,887
If a file exists, then it's
parent directory exists.

788
00:46:48,397 --> 00:46:51,727
And like simultaneously, if you
delete a directory, it shouldn't

789
00:46:51,727 --> 00:46:52,817
have any files within it.

790
00:46:53,967 --> 00:46:57,727
And that  invariant guarantees and
that the file system is a tree.

791
00:46:57,847 --> 00:46:58,207
Right?

792
00:46:58,537 --> 00:47:03,787
And then we, it's very easy to come
up with settings, with reads from the

793
00:47:03,787 --> 00:47:07,687
local file system where if you just
naively take that and write it into

794
00:47:07,687 --> 00:47:12,187
your SQLite database, you will end up
with data that does not form a tree.

795
00:47:12,815 --> 00:47:16,435
and then especially even with like
I know it's being unique, right?

796
00:47:16,435 --> 00:47:22,435
Like if I move a file from A to B, then
I might observe the add for it at B

797
00:47:23,825 --> 00:47:28,225
way before the delete at B or I might
observe it vice versa, where the file

798
00:47:28,225 --> 00:47:31,435
is transiently gone and disappeared and
we definitely don't wanna sync that.

799
00:47:31,795 --> 00:47:37,318
and then with  directories, if I have
like a, as a directory and then B as

800
00:47:37,318 --> 00:47:43,528
a directory, and then I move  it's, I
could observe a state where A moves into

801
00:47:43,528 --> 00:47:48,498
B, which then without doing the right
bookkeeping, might introduce a cycle in

802
00:47:48,498 --> 00:47:52,188
the graph and a cycle for directories
would be really bad news, right?

803
00:47:52,482 --> 00:47:57,072
so all of these invariants were things
that the file system APIs, they don't

804
00:47:57,072 --> 00:48:00,732
respect, even though the file system
internally has these invariants, right?

805
00:48:01,752 --> 00:48:04,422
You cannot create a direct
recycle on any file system.

806
00:48:05,412 --> 00:48:05,802
Definitely.

807
00:48:05,802 --> 00:48:09,989
I mean certainly without root And
all of these invariants exist but

808
00:48:09,989 --> 00:48:12,863
are not observable  through the APIs.

809
00:48:12,863 --> 00:48:16,583
And so then we sync Engine Classic
would get into the state where it's

810
00:48:16,583 --> 00:48:19,793
like local SQLite file would have
all types of violations like that.

811
00:48:20,303 --> 00:48:24,473
So then how do we read the tea
leaves of like the database is in

812
00:48:24,473 --> 00:48:26,933
a really weird state we can't lose.

813
00:48:26,933 --> 00:48:30,263
And to go back to, I think what you had
talked about at the beginning of this was

814
00:48:30,263 --> 00:48:36,293
that we always had the nuclear option of
dropping all of our local state and doing

815
00:48:36,293 --> 00:48:38,753
a full resync from the files themselves.

816
00:48:39,143 --> 00:48:42,443
But then the problem is that we
would entirely lose user intent.

817
00:48:42,863 --> 00:48:48,323
So if, for example, I was offline for
a month and I had a bunch of files,

818
00:48:48,803 --> 00:48:53,153
and then during that month other
people in my team deleted those files.

819
00:48:53,791 --> 00:48:58,838
If I came back online and didn't have
my local database, we would have to

820
00:48:58,838 --> 00:49:02,828
recreate those files and people would
complain about this all the time because.

821
00:49:03,418 --> 00:49:05,738
They would delete something and wanna
delete it, and then Dropbox would

822
00:49:05,738 --> 00:49:07,358
just randomly decide to resurrect it.

823
00:49:07,808 --> 00:49:12,441
So those types of decisions we, we tried
to avoid that as much as possible, but

824
00:49:12,441 --> 00:49:17,271
then that meant having to  look at a
potentially really confusing database and

825
00:49:17,271 --> 00:49:19,041
read what the user intent might have been.

826
00:49:19,761 --> 00:49:20,211
Right.

827
00:49:20,481 --> 00:49:24,411
I wanna dig a little bit more
into the topic of user intent.

828
00:49:24,441 --> 00:49:30,201
Since with Dropbox you've built a sync
engine very specifically for the use

829
00:49:30,201 --> 00:49:36,231
case of file management, et cetera, where
user intent has a particular meaning that

830
00:49:36,231 --> 00:49:41,181
might be very different from moving a
cursor around in a Google Docs document.

831
00:49:41,511 --> 00:49:47,618
So can you explain a little bit, what
are some of the, common scenarios of, and

832
00:49:47,618 --> 00:49:54,408
maybe subtle scenarios of user intent,
when it comes to the Dropbox design space?

833
00:49:55,218 --> 00:49:56,178
Yeah, totally.

834
00:49:56,535 --> 00:50:01,515
and I think the  for regular
things like say editing files.

835
00:50:01,830 --> 00:50:06,420
I think we saw that like people just
generally did not, maybe because

836
00:50:06,420 --> 00:50:09,690
of the way the system was even
its capabilities, people did not

837
00:50:09,690 --> 00:50:11,820
edit the same files all too often.

838
00:50:12,090 --> 00:50:17,033
So maintaining user intent when file,
when everyone is online, just kind of

839
00:50:17,333 --> 00:50:21,563
taking last writer wins  Where I think
user intent became very interesting is

840
00:50:21,593 --> 00:50:26,583
if someone went offline, like they're on
an airplane  before wifi and airplanes

841
00:50:27,026 --> 00:50:30,746
And they worked on their document and
someone else worked on the same time.

842
00:50:31,346 --> 00:50:35,906
In that case, we observed that users
always wanted to see the conflicted

843
00:50:35,906 --> 00:50:39,956
copy and that they wanted to get
the opportunity to say, like, I did.

844
00:50:39,956 --> 00:50:43,046
I put in a lot of effort into working
on this when I was on the plane.

845
00:50:43,346 --> 00:50:47,970
Someone else, put in probably a similar
amount of effort when they were online and

846
00:50:48,170 --> 00:50:50,830
you know, so last writer wins policies.

847
00:50:50,830 --> 00:50:55,700
There violated user expectations
quite a lot because either a person

848
00:50:55,700 --> 00:50:58,460
had to win and then the person
who lost would be really upset.

849
00:50:58,900 --> 00:51:00,970
so I think those were pretty interesting.

850
00:51:00,970 --> 00:51:05,113
I think with Moose, like with more
metadata operations  I think people

851
00:51:05,130 --> 00:51:06,420
were a little bit more permissive.

852
00:51:06,420 --> 00:51:10,680
Like if I moved something from one
folder to another, another person

853
00:51:10,680 --> 00:51:12,180
moved it to a different folder.

854
00:51:12,496 --> 00:51:15,076
having it just converged on
something as long as it converges.

855
00:51:15,136 --> 00:51:18,586
We observed it being like people
didn't worry about it too much.

856
00:51:18,810 --> 00:51:21,480
I think the place where user
intent is really interesting

857
00:51:21,480 --> 00:51:23,300
with moves is with sharing.

858
00:51:23,666 --> 00:51:26,983
So  I think thinking about this
from like the distributed systems

859
00:51:26,983 --> 00:51:31,333
perspective on causality, there would
be  like someone might have like,

860
00:51:31,423 --> 00:51:33,103
I dunno, their HR folder, right?

861
00:51:33,823 --> 00:51:38,353
And I don't know, like, let's say that
someone is transferring to the HR team is

862
00:51:38,383 --> 00:51:40,423
they're getting added to the HR folder.

863
00:51:41,158 --> 00:51:44,038
But then say before they were
on the team, they were on a

864
00:51:44,158 --> 00:51:45,358
performance improvement plan.

865
00:51:46,061 --> 00:51:50,958
So then the administrator for HR
would delete that file, make sure it's

866
00:51:50,958 --> 00:51:53,838
deleted, and then add them to the folder.

867
00:51:54,438 --> 00:51:59,178
And so their user intent is
express in a very specific

868
00:51:59,178 --> 00:52:00,978
sequencing of operations, right?

869
00:52:01,158 --> 00:52:04,038
That like this causally depended on this.

870
00:52:04,188 --> 00:52:08,238
I would not have invited 'em to the folder
unless the delete was stably synced.

871
00:52:08,848 --> 00:52:12,648
And that  making sure that gets
preserved throughout the system,

872
00:52:12,798 --> 00:52:16,428
even when people are going online
and offline and everything is a very

873
00:52:16,428 --> 00:52:18,048
hard distributed systems problem.

874
00:52:18,078 --> 00:52:18,468
Right.

875
00:52:18,901 --> 00:52:22,441
and it was intimately related
with the details of the product.

876
00:52:22,958 --> 00:52:23,378
Right.

877
00:52:23,421 --> 00:52:23,661
yeah.

878
00:52:23,661 --> 00:52:29,571
How did you capture that causality
chain of events since you probably also

879
00:52:29,571 --> 00:52:32,151
couldn't quite trust the system clock?

880
00:52:32,451 --> 00:52:33,681
How did you go about that?

881
00:52:34,085 --> 00:52:36,348
Yeah, this became even
more difficult, right?

882
00:52:36,348 --> 00:52:41,118
Where file system metadata was partitioned
across many shards in the database.

883
00:52:41,568 --> 00:52:45,528
So then we ended up using something like
Lamport timestamp, where every single

884
00:52:45,528 --> 00:52:47,328
operation would get assigned a timestamp.

885
00:52:47,448 --> 00:52:50,745
And those timestamps were usually
only reading and writing to their

886
00:52:50,745 --> 00:52:55,153
particular shard and for whatever
timestamp the client had observed.

887
00:52:55,423 --> 00:52:59,677
But then in these cases where there
were potentially cross shard, they

888
00:52:59,677 --> 00:53:03,397
weren't transactions, but like causal
dependencies, we would be able to say

889
00:53:03,397 --> 00:53:07,597
like, the operation to mount this or
to add someone to the shared folder

890
00:53:07,657 --> 00:53:11,917
and there them mounting it within
their file system has to have a higher

891
00:53:11,917 --> 00:53:14,887
timestamp than any right within that or.

892
00:53:15,532 --> 00:53:16,582
Rights including deletes.

893
00:53:16,948 --> 00:53:21,628
so then that way when the client is
syncing it would be able to know that when

894
00:53:21,628 --> 00:53:26,998
I am merging operation logs across all of
the different shards, I need to assemble

895
00:53:26,998 --> 00:53:28,828
them in a causally consistent order.

896
00:53:29,288 --> 00:53:33,058
And that  would then respect all
of these particular invariants.

897
00:53:33,438 --> 00:53:33,828
Right.

898
00:53:34,098 --> 00:53:38,448
So you having thought through those
different scenarios for Dropbox and

899
00:53:38,448 --> 00:53:43,758
made very intentional design decisions
that, for example, in one scenario

900
00:53:43,758 --> 00:53:46,728
last writer wins is not desirable.

901
00:53:46,728 --> 00:53:51,415
Since that might lead to a very sad
person stepping off the plane because

902
00:53:51,415 --> 00:53:54,955
all of your data is suddenly gone,
or the other person's data is gone.

903
00:53:55,262 --> 00:53:58,292
so you make very specific
design trade-offs here when it

904
00:53:58,292 --> 00:54:03,032
comes to somehow squaring the
circle of distributed systems.

905
00:54:03,182 --> 00:54:08,222
Which sort of advice would you have for
application developers or people even

906
00:54:08,552 --> 00:54:12,362
who are sitting inside of a company
and are now thinking about, oh, maybe

907
00:54:12,362 --> 00:54:17,552
we should have our own Dropbox style,
linear style sync engine internally.

908
00:54:17,552 --> 00:54:21,122
Which sort of advice would
you give them when they Yeah.

909
00:54:21,122 --> 00:54:23,132
Start thinking this through to the detail.

910
00:54:23,987 --> 00:54:28,505
Yeah, I'll talk through kind of how we
structured things at Dropbox to be able

911
00:54:28,505 --> 00:54:30,275
to navigate these types of problems.

912
00:54:30,395 --> 00:54:33,335
And I think the patterns
here, can be quite general.

913
00:54:33,605 --> 00:54:37,815
I think what we ended up with was
that like  thinking like distributed

914
00:54:37,815 --> 00:54:39,945
systems syncing is hard, right?

915
00:54:40,185 --> 00:54:45,645
So we would have the kind of base layer
of the sync protocol and how state

916
00:54:45,645 --> 00:54:49,245
gets moved around between the clients
and the servers and all the shards.

917
00:54:49,695 --> 00:54:52,575
We would have very strong
consistency guarantees there.

918
00:54:52,875 --> 00:54:57,345
So we would not use any of the
knowledge of the product at that layer.

919
00:54:57,725 --> 00:55:02,475
So  from a, like thinking of Dropbox
in the file system as a CRDT.

920
00:55:03,660 --> 00:55:06,420
Dropbox allows, like moves
to happen concurrently.

921
00:55:06,420 --> 00:55:09,690
It ha allows you to add something
while another thing is happening.

922
00:55:10,020 --> 00:55:12,780
But at the protocol level,
we kept things very strict.

923
00:55:12,780 --> 00:55:17,437
We kept them very close to being
serializable that every view of the

924
00:55:17,437 --> 00:55:20,857
system was identified by a very small
amount of state, like a timestamp.

925
00:55:21,067 --> 00:55:24,127
And that would fully determine the
state of the system and like the

926
00:55:24,127 --> 00:55:26,077
amount of entropy in that was very low.

927
00:55:26,497 --> 00:55:30,067
And then whenever you are modifying
it, you would say, here's what I expect

928
00:55:30,067 --> 00:55:34,267
the data to be, and if it doesn't match
exactly, it will reject the operation.

929
00:55:34,597 --> 00:55:39,727
And then by doing it, structuring things
in that way, then we made it very easy

930
00:55:39,727 --> 00:55:45,037
for product teams and for  even us
working on sync to embed all of these like

931
00:55:45,067 --> 00:55:47,677
looser more product focused requirements.

932
00:55:47,677 --> 00:55:51,247
They also may wanna change over time
into the end points, like layered on top.

933
00:55:51,247 --> 00:55:57,157
So every time we wanted to change a policy
on how  like a delete reconciles with an.

934
00:55:57,787 --> 00:55:59,647
You know, add for a folder or something.

935
00:55:59,887 --> 00:56:02,707
We didn't have to solve any distributed
systems problems to do that.

936
00:56:03,487 --> 00:56:07,897
So I think that like pattern of saying
that, like is there a good abstraction?

937
00:56:07,897 --> 00:56:11,467
Is there something that is like very
powerful that could solve a large

938
00:56:11,467 --> 00:56:16,267
class of problems, doing that well at
the lowest layer and then potentially

939
00:56:16,627 --> 00:56:18,577
weakening the consistency above it.

940
00:56:19,297 --> 00:56:24,217
I actually really like  the Rocicorp
folks have a really great description of

941
00:56:24,217 --> 00:56:28,897
their consistency model for Replicache of
it being like session plus consistency.

942
00:56:29,227 --> 00:56:34,087
And it's like a very similar idea
where like  when we build things on

943
00:56:34,087 --> 00:56:38,977
a platform, we may as our  with our
product hats on, like want users to

944
00:56:38,977 --> 00:56:42,607
not have to think about conflicts and
merging and all that in a lot of cases.

945
00:56:42,757 --> 00:56:45,397
But those decisions might be
very particular to our app.

946
00:56:45,397 --> 00:56:48,187
And that's something that holds
for everything on the platform.

947
00:56:48,457 --> 00:56:52,177
And then there's always a way to
embed those decisions onto, say.

948
00:56:52,552 --> 00:56:56,842
Session consistency and Replicache
or serializability and other systems.

949
00:56:57,082 --> 00:57:00,435
And so I think that's like  that
separation of concerns I

950
00:57:00,435 --> 00:57:03,615
think is something that  can
apply to a lot of systems.

951
00:57:04,105 --> 00:57:04,495
Right.

952
00:57:04,555 --> 00:57:09,895
So maybe we use this also as a transition
to talk a bit more about what you're

953
00:57:09,895 --> 00:57:12,295
now designing and working on Convex.

954
00:57:12,655 --> 00:57:19,225
What were some of the key insights that
you've taken with you from Dropbox that

955
00:57:19,225 --> 00:57:22,195
ultimately led to you co-founding Convex?

956
00:57:22,975 --> 00:57:27,068
Yeah, when we first were starting
Convex  we were looking at how apps

957
00:57:27,068 --> 00:57:28,238
are getting built today, right?

958
00:57:28,298 --> 00:57:32,498
Like web apps are  easier
to build than ever.

959
00:57:32,653 --> 00:57:37,013
Even in 2021, it's incredible
how much, like more productive

960
00:57:37,483 --> 00:57:39,703
that  compared to 10 years before.

961
00:57:39,793 --> 00:57:40,093
Right.

962
00:57:40,093 --> 00:57:45,613
It was, and I think we noticed that
the hard part for so many discussions

963
00:57:45,853 --> 00:57:50,110
was managing state and like how
state propagates I think it was from

964
00:57:50,110 --> 00:57:54,370
the Riffle paper right, on how like
so many issues in app development

965
00:57:54,370 --> 00:57:58,330
are kind of database problems in
disguise and that how techniques

966
00:57:58,330 --> 00:58:00,340
from databases might be able to help.

967
00:58:00,610 --> 00:58:05,797
So with Convex we were saying like, well
if we start with the idea of designing

968
00:58:05,797 --> 00:58:10,213
a database from first principles, can we
apply some of those database solutions

969
00:58:10,393 --> 00:58:11,923
to things across the whole stack?

970
00:58:12,253 --> 00:58:17,173
So say for example, when I'm reading
data from it within in my app, I have

971
00:58:17,173 --> 00:58:20,743
all of these React components that are
all reading different pieces of data.

972
00:58:21,193 --> 00:58:24,643
It'd be really nice if all of them
just executed at the same timestamp

973
00:58:24,703 --> 00:58:29,563
and I never had to handle consistency
issues where one component knows

974
00:58:29,563 --> 00:58:30,973
about a user or the other one doesn't.

975
00:58:31,423 --> 00:58:36,823
Similarly, like why isn't it possible
to be that I just use query across

976
00:58:36,823 --> 00:58:40,753
all my components and they just all
live update whenever I read anything,

977
00:58:40,753 --> 00:58:42,133
it's a automatically reactive.

978
00:58:42,403 --> 00:58:46,753
So those were some of the like
the initial kind of thought

979
00:58:46,753 --> 00:58:48,613
experiments for what led to Convex.

980
00:58:48,883 --> 00:58:52,243
I think the other one that was
really motivated from our time at

981
00:58:52,243 --> 00:58:56,143
Dropbox and I think is like kind
of a  both a blessing and a curse.

982
00:58:56,143 --> 00:58:59,833
It's kind of like one of the key
design decisions for Convex is

983
00:58:59,833 --> 00:59:03,133
that Convex is very opinionated
about there being a separation

984
00:59:03,133 --> 00:59:04,523
between the client and the server.

985
00:59:05,303 --> 00:59:09,463
So we  saw this at Dropbox where they
were just different teams, right?

986
00:59:09,853 --> 00:59:13,393
And you know, as we've seen with like
even the origin of GraphQL, right?

987
00:59:13,393 --> 00:59:16,153
Like that ability to
decouple development between.

988
00:59:16,830 --> 00:59:20,505
teams working on user facing features
and the way that the data fetching

989
00:59:20,505 --> 00:59:23,175
is implemented on the backend,
it's gonna be really powerful.

990
00:59:23,805 --> 00:59:27,615
And so kind of the kind of thought
experiment with Convex is, can we

991
00:59:27,722 --> 00:59:32,522
maintain a very strong separation while
still getting like live updating, while

992
00:59:32,522 --> 00:59:36,752
still getting a really good ergonomics
for both consuming data on the client

993
00:59:36,752 --> 00:59:38,372
and like fetching it on the server.

994
00:59:39,015 --> 00:59:39,435
Right.

995
00:59:39,435 --> 00:59:44,175
So yeah, walk me through a little bit
more through the evolution of Convex then.

996
00:59:44,235 --> 00:59:49,158
And so, in, in terms of all the other
options that are out there in terms

997
00:59:49,158 --> 00:59:55,698
of state management and I think most
what applications are using is probably

998
00:59:55,818 --> 01:00:01,892
something that at least to some degree is
somewhat customized and hand rolled and

999
01:00:01,892 --> 01:00:04,682
comes with its own huge set of trade-offs.

1000
01:00:05,348 --> 01:00:08,228
Help me better understand sort
of the, where you mentioned the,

1001
01:00:08,390 --> 01:00:11,135
opinionated nature of Convex.

1002
01:00:11,435 --> 01:00:13,272
What are the, benefits of that?

1003
01:00:13,362 --> 01:00:16,262
What are the downsides of
that and other implications?

1004
01:00:16,752 --> 01:00:20,562
Yeah, so when you write an app
on Convex  we can use maybe

1005
01:00:20,562 --> 01:00:22,242
like a basic to do app, right?

1006
01:00:22,602 --> 01:00:24,072
The linear clone, everyone does.

1007
01:00:24,355 --> 01:00:26,695
you write endpoints like
you might be used to, right?

1008
01:00:26,695 --> 01:00:30,805
Where it's like list all the to-dos in a
project like update a to-do in a project.

1009
01:00:31,182 --> 01:00:34,902
and those get pushed as your
API to your Convex server.

1010
01:00:35,602 --> 01:00:39,292
the implementations of that API can
then read and write to the database

1011
01:00:39,292 --> 01:00:43,492
and Convex has like a, kinda like Mongo
or Firebase, like API for doing so.

1012
01:00:44,008 --> 01:00:48,688
I think the main benefit then of
Convex relative to more traditional

1013
01:00:48,688 --> 01:00:53,172
architectures is that if you're on the
client, the only thing you need to do

1014
01:00:53,412 --> 01:00:55,722
is call the, like the use query hook.

1015
01:00:56,067 --> 01:01:00,957
You're saying like, I am looking at a
project I just do use  like use query

1016
01:01:01,347 --> 01:01:07,857
list tasks and project that will then
talk to the server, run that query, but

1017
01:01:07,857 --> 01:01:12,057
then also set up the subscription and
then whenever any data that that query

1018
01:01:12,057 --> 01:01:16,227
looked at changes, it will efficiently
determine that and then push the update.

1019
01:01:16,857 --> 01:01:21,567
So part of what is like been nice
with Convex is that you are getting

1020
01:01:21,917 --> 01:01:26,307
a  client that has a web socket
protocol, it has a sync engine built in.

1021
01:01:26,637 --> 01:01:30,297
You're getting infrastructure for
running JavaScript at scale and for

1022
01:01:30,297 --> 01:01:32,517
handling sandboxing and all of that.

1023
01:01:32,757 --> 01:01:35,757
And then you're also getting a
database, which is, you know.

1024
01:01:36,102 --> 01:01:39,342
One, supporting transactions
or reading and writing to it.

1025
01:01:39,552 --> 01:01:43,212
But then it also supports this
efficient  like being able to subscribe

1026
01:01:43,212 --> 01:01:47,652
on, I ran this query, this query
just  ran a bunch of JavaScript.

1027
01:01:47,652 --> 01:01:50,752
It looked at different rows
and it ran some queries.

1028
01:01:51,235 --> 01:01:55,965
the system will automatically efficiently
determine if any right overlaps with that.

1029
01:01:56,385 --> 01:01:59,805
So the combination of all of those
things is like  part of the benefit of

1030
01:01:59,805 --> 01:02:03,735
Convex, you just write TypeScript and
you write it in a way that's, feels

1031
01:02:03,735 --> 01:02:06,645
very natural and everything just works.

1032
01:02:07,335 --> 01:02:12,825
And I think some of the like downsides is
that it's  it is a different set of APIs.

1033
01:02:13,098 --> 01:02:16,658
it's not using sql, it's doing
things a little bit differently

1034
01:02:16,658 --> 01:02:17,858
than they've been done before.

1035
01:02:18,342 --> 01:02:22,842
yeah, it's like kind of interesting
even today to see like what you know.

1036
01:02:23,262 --> 01:02:24,942
Talking about AI code gen, right?

1037
01:02:24,942 --> 01:02:28,422
Like models have been trained,
pre-trained on this huge corpus

1038
01:02:28,422 --> 01:02:29,322
of stuff on the internet.

1039
01:02:29,592 --> 01:02:32,412
And when are they good at
adopting new technologies?

1040
01:02:32,682 --> 01:02:35,202
Technologies that might be
after their knowledge cutoff.

1041
01:02:35,562 --> 01:02:38,887
And when are they like it's better just
to stick to things that they know already.

1042
01:02:39,592 --> 01:02:39,952
Right.

1043
01:02:39,997 --> 01:02:45,428
So what you've mentioned before where you
say, Convex is rather opinionated for me.

1044
01:02:45,858 --> 01:02:49,668
in let's say five years ago,
I might've been much more of

1045
01:02:49,668 --> 01:02:53,028
like, oh, but maybe there's a
technology that's less opinionated

1046
01:02:53,028 --> 01:02:54,468
and I can use it for everything.

1047
01:02:54,828 --> 01:02:58,518
But the more experience I got,
the more I realized no, actually.

1048
01:02:58,848 --> 01:03:02,478
I want something that's very
opinionated, but opinionated

1049
01:03:02,538 --> 01:03:04,338
and I share those opinions.

1050
01:03:04,518 --> 01:03:06,378
Those are exactly for my use case.

1051
01:03:06,378 --> 01:03:08,448
So I think that is much better.

1052
01:03:08,448 --> 01:03:12,648
This is why we have different technologies
and they are great for different

1053
01:03:12,648 --> 01:03:17,208
scenarios, and I think the more a
technology tries to say, no, we're,

1054
01:03:17,208 --> 01:03:22,912
we're best for everything, I think the,
less it's actually good at anything.

1055
01:03:23,392 --> 01:03:26,932
And so I greatly appreciate you
standing your ground and saying

1056
01:03:26,932 --> 01:03:30,872
like, Hey, those are, our design,
decisions that we've made.

1057
01:03:31,022 --> 01:03:35,615
And those are the use cases where,
you'd be really well served building

1058
01:03:35,615 --> 01:03:37,355
on top of something like Convex.

1059
01:03:37,685 --> 01:03:42,522
And, I particularly like for now where
TypeScript is really the, default

1060
01:03:42,522 --> 01:03:44,772
language to build full stack applications.

1061
01:03:45,042 --> 01:03:48,732
And it's also increasingly
becoming the default for.

1062
01:03:48,933 --> 01:03:51,250
ai, based applications as well.

1063
01:03:51,430 --> 01:03:57,040
And AI based systems speak type
script, just as well as English.

1064
01:03:57,640 --> 01:04:02,090
And  given that Convex makes
that full stack super easy.

1065
01:04:02,450 --> 01:04:07,893
And also I think you can, when
you build local-first apps, it can

1066
01:04:07,893 --> 01:04:11,913
sometimes get really tricky because
you empower the client so much.

1067
01:04:11,913 --> 01:04:15,453
You give the client so much
responsibility and therefore there's

1068
01:04:15,453 --> 01:04:17,193
many, many things that can go wrong.

1069
01:04:17,223 --> 01:04:21,653
And I think Convex therefore, takes
a more conservative approach and says

1070
01:04:21,653 --> 01:04:25,881
like, Hey, everything that happens on
the server is like highly privileged

1071
01:04:25,881 --> 01:04:27,501
and this is your safe environment.

1072
01:04:27,831 --> 01:04:31,491
And the client will try to give
you the best user experience and

1073
01:04:31,491 --> 01:04:33,081
developer experience out of the box.

1074
01:04:33,831 --> 01:04:37,551
But the client could be in a
more adversarial environment.

1075
01:04:37,611 --> 01:04:39,831
And I think those are
great design trade offs.

1076
01:04:40,071 --> 01:04:45,208
So, I think that is a fantastic foundation
for tons of different applications.

1077
01:04:45,818 --> 01:04:46,238
Yeah.

1078
01:04:46,701 --> 01:04:49,011
talking about some of these
strong opinions being both

1079
01:04:49,011 --> 01:04:50,271
blessings and curses, right?

1080
01:04:50,271 --> 01:04:54,681
Like over the past few months, one
thing we've been working on is trying

1081
01:04:54,681 --> 01:04:58,401
to bridge the gap between those
two points in the spectrum, right?

1082
01:04:58,705 --> 01:05:02,675
we wrote a blog post on it a few months
ago of like working on what we're calling

1083
01:05:02,675 --> 01:05:08,135
our like Object sync engine, trying to
take a lot of the principles from more of

1084
01:05:08,135 --> 01:05:14,270
a local-first type approach of having a
data model that it is synced to the client

1085
01:05:14,450 --> 01:05:18,020
and the only interaction between the
server and the client is through the sync.

1086
01:05:18,440 --> 01:05:22,580
And the client then can always render
its UI just looking at the local

1087
01:05:22,580 --> 01:05:24,380
database and it can be offline.

1088
01:05:24,530 --> 01:05:28,040
It's also fully describes the
app stage so it can be exported

1089
01:05:28,040 --> 01:05:29,600
and rehydrated or whatever.

1090
01:05:29,904 --> 01:05:33,564
it's very interesting design exercise
we've been on to say like, can

1091
01:05:33,564 --> 01:05:39,804
you structure a protocol on a sync
engine in a way such that the UI

1092
01:05:39,834 --> 01:05:42,984
is still reading and writing to a
local store that is authoritative.

1093
01:05:43,344 --> 01:05:47,514
But then that local store is like to kind
of use like an electric SQL terminology is

1094
01:05:47,514 --> 01:05:52,584
like that is a shape that is some mapping
of a strongly separated server data model.

1095
01:05:52,794 --> 01:05:56,754
So we still have a client data model
and server data model, which might be

1096
01:05:56,754 --> 01:06:01,419
owned by different teams and evolve
independently and, we also have that

1097
01:06:01,419 --> 01:06:06,159
strong separation where the implementation
of the shape is privileged and running

1098
01:06:06,159 --> 01:06:10,925
on the server and has authorization rules
built in  and get the best of both worlds.

1099
01:06:10,925 --> 01:06:16,255
And  we've kind of, we have a like  beta
that we've  not released publicly thought

1100
01:06:16,375 --> 01:06:19,585
open, sourced out there, but  kind
of a thing where we, I think they're

1101
01:06:19,585 --> 01:06:21,355
still figuring out like the DX for it.

1102
01:06:21,355 --> 01:06:24,055
And I think we have something
that like algorithmically works

1103
01:06:24,355 --> 01:06:28,165
and it's like the protocol works,
but it's like, it's kind of hard.

1104
01:06:28,165 --> 01:06:28,315
Right.

1105
01:06:28,315 --> 01:06:32,395
It kind of reminds me a lot of writing
GraphQL resolvers of like saying How do I

1106
01:06:32,395 --> 01:06:35,215
take  the messages table from my chat app?

1107
01:06:35,710 --> 01:06:39,280
Then under the hood that might be
joining stuff from many different

1108
01:06:39,280 --> 01:06:43,060
tables and filtering rows, or might
even be doing a full tech search

1109
01:06:43,060 --> 01:06:45,250
query in another view or something.

1110
01:06:45,547 --> 01:06:48,817
and coming up with the right
ergonomics to make that feel

1111
01:06:48,847 --> 01:06:50,767
great for a day one experience.

1112
01:06:50,767 --> 01:06:53,047
I think something that's like
still we're working on, still

1113
01:06:53,047 --> 01:06:53,902
kinda like a research project,

1114
01:06:54,097 --> 01:06:54,577
right?

1115
01:06:54,637 --> 01:06:58,837
Well, when it comes to data, there is no
free lunch, but I'd much rather to have

1116
01:06:58,837 --> 01:07:03,787
it be done in the order and sequencing
that you're going through, which is

1117
01:07:03,787 --> 01:07:09,307
having a solid foundation that I can
trust and then figuring out the right

1118
01:07:09,307 --> 01:07:14,047
ergonomics afterwards, since I think
there's many, many tools that start with

1119
01:07:14,047 --> 01:07:19,747
great ergonomics, but later realize that
it's on a built, on a unsound foundation.

1120
01:07:19,957 --> 01:07:24,137
So when it comes to data, I want a
trustworthy foundation, and I think

1121
01:07:24,137 --> 01:07:25,964
you're going about in the right order.

1122
01:07:26,529 --> 01:07:31,209
Hey, Sujay, I've been learning
so much about one of my favorite

1123
01:07:31,209 --> 01:07:33,099
products of all time, Dropbox.

1124
01:07:33,789 --> 01:07:39,599
I've learned so much of like how the
sausage was actually made, how it evolved

1125
01:07:39,599 --> 01:07:45,119
over time and I'm really excited that
you got to share the story today and

1126
01:07:45,419 --> 01:07:48,272
many me included, got to, learn from it.

1127
01:07:48,452 --> 01:07:51,002
Thank you so much for taking the
time and sharing all of this.

1128
01:07:51,572 --> 01:07:52,202
Thanks for having me.

1129
01:07:52,202 --> 01:07:53,207
This is super, super fun.

1130
01:07:54,159 --> 01:07:56,739
Thank you for listening to
the localfirst.fm podcast.

1131
01:07:56,919 --> 01:08:00,009
If you've enjoyed this episode and
haven't done so already, please

1132
01:08:00,009 --> 01:08:01,299
subscribe and leave a review.

1133
01:08:01,689 --> 01:08:04,209
Please also share this episode
with your friends and colleagues.

1134
01:08:04,599 --> 01:08:07,599
Spreading the word about the
podcast is a great way to support

1135
01:08:07,599 --> 01:08:09,309
it and to help me keep it going.

1136
01:08:09,969 --> 01:08:13,389
A special thanks again to Jazz
for supporting this podcast.

1137
01:08:13,689 --> 01:08:14,649
I'll see you next time.