Over Engineered

Sometimes you have files or other large chunks of data that you need to associate with a record in your database. It might be convenient to just store that as base64-encoded data or HTML in a "longtext" column, but that can eventually cause issues—especially as the table grows. What other options are there?

In this episode of Over Engineered, we go back to the show's roots and try to find the absolute best solution to a problem we already have an acceptable solution for.

Creators and Guests

Host

Chris Morrell

Father of two. CEO/CTO at InterNACHI. Host of Over Engineered.

Guest

Bogdan Kharchenko

Guest

Skyler Katz

What is Over Engineered?

A podcast where we explore unimportant programming questions (mostly PHP/Laravel/JavaScript) in extreme detail.

00:00:07.05
Chris Morrell
All right. Welcome back to Overengineered, the podcast where we ask the question, what's the absolute best way to do things we already have a perfectly acceptable solution for? ah Today, I am back with Bob Denkarchenko and Skylar Katz, and um we're going to be hearkening back to the early episodes of Overengineered, where I sort of I started this podcast with this concept of like, there are these things that we talk about and we never have time to actually like really just dig deep and come up with a really great solution for because they just don't matter that much.

00:00:47.28
Chris Morrell
Um, And I feel like the show has shifted over the years away from that some, but I still want to have those conversations from time to time. So we we had an opportunity arise um and I thought we'd get into it. But ah before we do, you guys want to say hi?

00:01:06.34
Skyler Katz
Hello, hello. It's good to be back.

00:01:09.62
Bogdan Kharchenko
Hey, Chris. Hey, everybody. Thank you, Chris, for inviting me and Skylar back on the show. We are the OG. If it wasn't for us, I don't know where your show would be today, Chris.

00:01:17.12
Chris Morrell
That's right.

00:01:18.95
Bogdan Kharchenko
So it's great to have us back.

00:01:20.39
Chris Morrell
it It wouldn't exist without you. Yeah.

00:01:22.87
Bogdan Kharchenko
ah Exactly.

00:01:23.45
Skyler Katz
You're welcome.

00:01:25.01
Bogdan Kharchenko
you're

00:01:27.70
Chris Morrell
um All right, let me set the stage. So ah we have we have this like sort of programming architecture, database architecture question that we want to kind of work through, which is um when you have tables,

00:01:48.12
Chris Morrell
that hold big blobs of data, right? So we have a couple of use cases. One is a table that holds ah the contents of large documents in it, um and others are tables where we're holding base 64 encoded image data.

00:02:03.59
Chris Morrell
um And in both of those cases, the amount of data at the individual row level is not that insane, right? The the base 64 images are um you know small cropped images of ah of a black and white signature, so it's not that much data.

00:02:23.17
Chris Morrell
But when you compound that over many, many rows and a couple different tables holding this kind of data, it starts to add up. um And in the other case, you know we have this table that's it's got a million and a half rows in it.

00:02:35.35
Bogdan Kharchenko
Thank you.

00:02:37.30
Chris Morrell
It's not an insane ah number of rows. It's not an insignificant number of rows either. um But ah each one of those rows rows contains an entire document in it.

00:02:51.05
Chris Morrell
um And just operating on the table is a little bit ah clumsy. And if you have to take a database dump, it takes a long time. And if you need to import that database dump, it takes a long time.

00:03:01.88
Chris Morrell
It's a little awkward to work with. And we're also seeing this with verbs events. If we fire a ton of verbs events that have large

00:03:08.44
Skyler Katz
Thank you.

00:03:11.29
Chris Morrell
chunks of data attached to them. Now our verbs events table is getting kind of large. So it's a problem that we've bumped into a bunch of times. um It's a conversation that's happened on the the verbs discord a bunch. you know How do I fire events that have files associated with them, which maybe those files could be even larger than what we're talking about. And right now, verbs doesn't have a great solution to that other than ah just tell people, put the file in s three set up some ah you know permissions on that bucket such that the files can't be easily deleted and and ah associate and attach reference to those files in your event.

00:03:52.33
Chris Morrell
But anyway, um there are a couple different ways to approach this. ah starting from we just don't change anything and it's fine going up to a very ah involved but very clever solution that I think I've come up with and and I thought we'd ah we've run through them.

00:04:05.12
Bogdan Kharchenko
you

00:04:13.48
Chris Morrell
ah Is there anything that i'm missing?

00:04:17.59
Skyler Katz
I don't think that. i think those are, ah I mean, those are the two ends of the spectrum, I suppose.

00:04:24.22
Chris Morrell
so I mean, i I think that this is a problem that probably a lot of people have bumped into in different ways, right? It's very convenient to have relational data.

00:04:36.12
Chris Morrell
i Actually, before I get started on that, do think there's one other relevant piece of information, which is in in almost all the cases that we've been talking about internally at InterNACHI, it's been data that we...

00:04:55.45
Chris Morrell
we care about for a specific period of time, and then it becomes less and less likely that we'll ever need, um that we'll need to pull up the larger parts of the data later, right?

00:05:10.99
Chris Morrell
But we still need to keep it around and we still may need it, but it's like, in the case of the documents, it's like while the document is active, we're gonna be loading it from the database regularly.

00:05:22.31
Chris Morrell
um But then those records are mostly just used for aggregate queries and and sort of directory listings and stuff like that and not necessarily shown or queried ever.

00:05:33.70
Bogdan Kharchenko
Thank you.

00:05:37.05
Chris Morrell
um So there's an element of sort of time to the the the way we're thinking about it that I think comes into play where um it's per perfectly feasible to potentially like sort of offload that data somewhere else after a certain period of time um so that we can keep that the database size and sort of like the keep the database nimble, but still have like quick immediate access to the data when needed.

00:06:07.61
Chris Morrell
um so that that just feels like one other relevant detail.

00:06:12.66
Bogdan Kharchenko
Yeah, so one thing I wanted to maybe talk about a little bit before we get into the weeds of your over-engineered solution, Chris, is kind of the issues that we're having, right? It's, you know, we're trying to just, with our ORM, query a few hundred records or a few thousand records.

00:06:29.19
Bogdan Kharchenko
And because we have to pull this additional column every time, You know, it's just, you know, slowing things down because it is doing like an IO read of that data. And, you know, i just kind of want to set the stage of like what we're actually dealing with, you know, and, you know, I think like one of the ways that we've been combating this so far you

00:06:45.34
Chris Morrell
Yeah.

00:06:50.58
Bogdan Kharchenko
you know, explicitly asking for certain columns from our database in our eloquent query. And, you know, and that's, it's great, but I think like it is a little bit tedious where you have to say like, yeah, of course I need to get a created as column and you know, the, some other column, the user ID column in like six different places is just not ideal.

00:07:14.60
Chris Morrell
yeah

00:07:15.64
Bogdan Kharchenko
so um

00:07:16.72
Chris Morrell
And it's easy to forget as well.

00:07:17.03
Bogdan Kharchenko
you know, yeah.

00:07:18.68
Skyler Katz
Thank

00:07:18.69
Chris Morrell
Like maybe you optimize a bunch of different queries and then you add a new feature and you don't remember, oh, I have to explicitly select kle columns with this particular model.

00:07:30.64
Chris Morrell
You know, it's just like not an obvious thing to keep track of.

00:07:34.30
Bogdan Kharchenko
Yeah, yeah, totally. I mean, we've obviously just been experiencing that by grabbing too much data, you know, and the query, even though it's very fast and it's all optimized, but because it has to get all this data, it's just super slow, so.

00:07:49.57
Skyler Katz
And it just doesn't, it's not fun to write that code when you're like, of you know, model query, you've got all these things and you're like, ah, I gotta, gotta select with just an array of, of like column IDs.

00:08:01.40
Skyler Katz
It just isn't, I don't know. It's not Laravel.

00:08:04.54
Chris Morrell
It's not eloquent code.

00:08:06.51
Skyler Katz
Exactly.

00:08:06.65
Bogdan Kharchenko
it's It's true. It's true. Yeah.

00:08:09.20
Chris Morrell
Yeah, I do think, i know that there are people who sort of ah take a different stance than I do on this stuff, but my general my general rule of thumb with eloquent is try not to heavily optimize until you need to, because

00:08:09.26
Bogdan Kharchenko
Yeah.

00:08:29.47
Chris Morrell
um those optimizations always come with some cost. And oftentimes the upside is is minimal, if any. And the downside is like you're going to spend three hours debugging something because in this one place you happen to like decide not to fetch a column and it's the error is not like showing up in some way or you're like, you know, your relational query is scoped ah somewhere else and you don't and you're not getting the data back that you're expecting but it's like not an obvious thing I just find that the more you can just do straight queries with eloquent even if it's a little bit less optimal the better your life is going to be and then you optimize them when you actually start to see a need um

00:09:16.14
Bogdan Kharchenko
Amen.

00:09:21.88
Chris Morrell
Or, you know, there's some cases where it's like, okay, I'm going to be operating on every single record in this table, um and I only need the ID. Of course, I'm only going to, like, query just the ID in that case.

00:09:33.04
Chris Morrell
But in most cases, you just don't need to do that, and it makes your life so much better. So I think you're right, Skylar. Like, it just sucks to have to do it all the time, basically.

00:09:43.45
Skyler Katz
Yeah. I've, I've run into situations previously where like I've selected columns and then I'm just like, oh, it's null. Like, why is it null? Oh, because I, in, in the part where I was fetching the data, I didn't select that, that like new column that I added and pain.

00:09:59.66
Chris Morrell
Right.

00:10:01.98
Chris Morrell
Yes. And then the default eloquent behavior is it just gives you a null and you don't know, is this actually null or did I not request this data?

00:10:13.49
Chris Morrell
Which there is a way to prevent, but.

00:10:13.94
Bogdan Kharchenko
Oh, yeah. I was just going to mention that, Chris. i mean to cut you off. There is a way to turn that flag on so that way, you know, your your application blows up a little bit when you try to do that.

00:10:27.16
Bogdan Kharchenko
ah But still, I agree. It's not obvious. it's it's It's painful to work with.

00:10:32.84
Chris Morrell
Yeah.

00:10:35.31
Chris Morrell
So there are couple simple solutions to this problem or simpler solutions to this problem.

00:10:35.71
Bogdan Kharchenko
So,

00:10:39.09
Bogdan Kharchenko
mm-hmm.

00:10:41.40
Chris Morrell
the The most obvious one is just, you know, move the heavy data to another table, right? Query it when you need it ah You can even query it through joins so that you can still treat it like um a regular attribute on the table and just use a scope to join it in when you need to.

00:11:03.42
Chris Morrell
And that way it's sort of like an opt-in instead of an opt-out. ah Maybe throw an accessor on there so that if it's not loaded,

00:11:09.51
Bogdan Kharchenko
Thank you.

00:11:12.79
Chris Morrell
um lazily loads it and ah you could throw some short sort of exception if it was an n plus one situation i'm sure there's a way to tie into the default uh Laravel behavior there um and i think that would solve a lot of the problems right i mean frankly that may be the solution that we end up reaching for because it's it's pretty simple and it doesn't require us to reinvent too much of the wheel um It doesn't solve a bunch of the DX problems though, because it doesn't really, you know you still have 40 gigs of data in your database dumps if you're if you need to pull that that ah backup down.

00:11:57.74
Chris Morrell
um

00:11:59.14
Skyler Katz
Well, and I feel like.

00:11:59.34
Chris Morrell
And it just means, yeah, you've got separate tables now that you're dealing with, right?

00:12:03.93
Bogdan Kharchenko
Thank you.

00:12:04.78
Skyler Katz
You have separate tables, but also, i mean, in even in the verbs, the verbs events context, like you would have to have like weird stuff would happen if you don't end up with a record in the other table where the blob of data is supposed to be.

00:12:21.88
Chris Morrell
Right.

00:12:22.02
Skyler Katz
And so like, if we had an articles table and the content was in some other table, but for some reason it didn't get written, like you just end up in a weird place too, with things potentially getting out of sync.

00:12:33.77
Chris Morrell
Right. Yeah, for sure.

00:12:34.67
Skyler Katz
which isn't resolved by your complicated ah situation either, but it's,

00:12:37.72
Bogdan Kharchenko
Yeah.

00:12:41.04
Bogdan Kharchenko
I mean, I will say, um you know, yes, there's an extra table and there's maybe an additional join, but this is like a pattern that everybody's used to. You know, you just have this belongs to relationship or has one relationship and you call it a day, right? This is like a very known pattern.

00:12:57.38
Bogdan Kharchenko
And we do these types of relationships, not necessarily for specific like blob column or you know, a long text column, but You know, they exist and I feel like developers are used to that.

00:13:10.18
Bogdan Kharchenko
So yes, there could be some instances where something goes south and, you know, that thing didn't get copied over to the to that additional table. um But I really think that that could be mitigated. and ah just think in general, people are much more comfortable with that type of, you know, offloading of data into another table.

00:13:34.35
Bogdan Kharchenko
So I think, yeah, totally, totally.

00:13:34.47
Chris Morrell
Yeah, I mean, it's definitely more straightforward, for sure.

00:13:39.32
Chris Morrell
the The downside, i mean, i think the big downside for me, or or the thing that makes me think that at least in our use case, it's not the right solution, is verbs. um Because...

00:13:51.48
Chris Morrell
either we have to introduce some sort of like special, i guess we could create a new table called like verbs heavy data or like, i don't know, some other table that we make clear is also part of verbs.

00:14:08.17
Chris Morrell
Um, and just sort of internally treat that like the event store, like this, this table can't be messed with. Um,

00:14:19.87
Chris Morrell
Or, yeah, I don't know what the solution, and unless we did some weird, we would have to do some weird stuff with verbs, right? We'd have to maybe push that data to, first we'd write that data data to this other table, and then we'd fire the event and reference the ID of the other, I don't know.

00:14:35.35
Bogdan Kharchenko
you

00:14:38.94
Chris Morrell
It just feels, it feels pretty bad when you get into the event sourced side of things. um to have something that's so separate from the event sourced, all the other event source data, but is so integral to it.

00:14:57.29
Chris Morrell
That's my take, at least.

00:15:00.11
Bogdan Kharchenko
But I mean, it sounds like in the case of verbs specifically, the event data is very crucial for you know you know making sure the the event state is basically built up correctly, right?

00:15:13.56
Bogdan Kharchenko
And you know I feel like, i't maybe i don't fully understand the problem with verbs, but there is obviously also a large chunk of JSON data inside the table.

00:15:24.41
Bogdan Kharchenko
And it's basically really unneeded unless you're replaying the event, right? Is that what I understand?

00:15:31.61
Chris Morrell
Well, yeah, and even if you're replaying the event, there are lots of replays that you might do that never touch the document contents, right? So in in the case of these documents, it's like,

00:15:40.84
Bogdan Kharchenko
Sure. always I see.

00:15:43.02
Chris Morrell
we have a you know document created, document updated type events. And right now we're projecting that to the database, but we may want to project to some sort of analytical tool as well in the future.

00:15:59.74
Chris Morrell
And most likely for all of those projections, we don't need the document contents at all. So being able to quickly and efficiently fetch all of that data without having to get the that big blob of contents that we don't care about, which in this case we can't exclude because it's a JSON column. um i mean we could do I guess we could do some really crazy like JSON sub query type stuff, but I don't know what the efficiency of that looks like and I don't know. yeah I'm just not sure what that would look like.

00:16:36.41
Chris Morrell
um

00:16:37.49
Skyler Katz
I mean, maybe...

00:16:37.57
Bogdan Kharchenko
I see, I understand.

00:16:37.87
Chris Morrell
And I imagine that no matter what, like, InnoDB still has to fetch the JSON contents, right? In the case of a long text, you know, InnoDB is putting that contents on a different page, right?

00:16:52.15
Bogdan Kharchenko
Mm-hmm. Mm-hmm.

00:16:52.38
Chris Morrell
So unless you, if you don't ask for it, there's like literally no performance overhead. Whereas with JSON content, I can't imagine that that's the same. I could be wrong, but I can't imagine it's the same.

00:17:04.86
Skyler Katz
I in verbs, when when we're serializing the data down to store it in the database, could there be some thing that verbs does where it's just like this JSON is large or like a key is large and it's just going to like, um within verbs, dump that somewhere and just store a reference to it in the data column? Yeah.

00:17:31.28
Chris Morrell
Yeah, 100%.

00:17:31.83
Skyler Katz
And then pull it back out, just like sort of behind the scenes so that the end user has no care in the world about it.

00:17:42.38
Chris Morrell
I mean, I would like i would like um verbs to have a built-in solution. It feels bad for us to like...

00:17:49.23
Bogdan Kharchenko
Thank you.

00:17:53.56
Chris Morrell
put together some sort of taped on extra thing that we do on top of verbs, it feels like, you know, we, we, it would be so much better if it was something that was built in and like, you know, we already have the serialization deserialization pipeline in verbs. So like, it's the type of thing that could be handled fairly transparently. It,

00:18:18.86
Chris Morrell
it it does feel like,

00:18:19.48
Skyler Katz
Put an attribute on the key in the payload or something and just ship it off.

00:18:22.86
Chris Morrell
Yeah, exactly. Yeah, yeah.

00:18:28.05
Chris Morrell
But whatever it is, I think it just feels like creating a separate like big contents table that's just like a UUID and a long text and just like referencing that UUID somewhere ah feels gross.

00:18:46.94
Chris Morrell
if it If it weren't for verbs, I think I could live with it. But with the way we do verbs, it feels a little gross to me. um I'm not necessarily opposed to it.

00:18:59.83
Chris Morrell
um and Because it does, you know, the simplicity side of it is really nice. Right? um But, yeah, it just feels like not great.

00:19:16.06
Bogdan Kharchenko
Yeah, so I guess ah here, let me kind of step back a little bit. So like one of the things that you know we wanna solve with you know potentially the software engineer solution is you know not just the query performance, but just not having to always ask for all of the data if you're just you know needing like a small little column in like the verb JSON payload, right?

00:19:43.76
Bogdan Kharchenko
Because what you're saying is in the verbs JSON event payload, um you would just say like, I just want this and this key rather than the entire document that's attached to it.

00:19:57.44
Bogdan Kharchenko
Is that what I understand?

00:19:59.29
Chris Morrell
Yeah, I mean, i think you would essentially wrap the data behind some sort of abstraction that has a way to retrieve the data.

00:20:11.08
Chris Morrell
And until you retrieve it, it's like, um you know, the thing that makes me, it makes me think of is when I was, when I was exploring Swift, there's like the concept of wrapped variables where it's like,

00:20:24.12
Chris Morrell
when the variable's wrapped, you don't know if it has a value or if it's an error, and you can kind of pass them around. ah And then in the code that actually cares, it um unwraps the variable and then deals with sort of the consequences of that action.

00:20:38.88
Bogdan Kharchenko
Mm-hmm. Mm-hmm.

00:20:39.97
Chris Morrell
um So it was kind of that idea, like you'd pass around some sort of object that if you'd never need the data, all you're doing is passing around a reference to an object.

00:20:51.63
Chris Morrell
And then the moment you need the data, either the object would already have the data or the object could load the data or the object could throw an exception if for some reason the data wasn't there, right?

00:21:02.61
Chris Morrell
um And ah did ah did a little bit of research and the term for this is the claim check pattern. um And so this is like a relatively well-established pattern in applications that deal with lots of...

00:21:19.82
Chris Morrell
big data payloads where you essentially have a system where you exchange some data for a claim check, you know which is just like some metadata.

00:21:31.11
Chris Morrell
ah And then at any time you can exchange the claim check back for the data. But until you need it, you're just passing around like a UID or something like that, that um represents the data and you only need to load it if you need it.

00:21:46.58
Chris Morrell
um And I feel like you could implement this with a database table. You can implement this with something like s three You could put it on the file system if you wanted to, although you probably don't want to. But like um

00:22:00.87
Chris Morrell
it it's a nice sort of generalized pattern that solves this problem of just like, let's offload this and only get it if we need it. You know what I mean?

00:22:12.72
Bogdan Kharchenko
Yeah, yeah, I mean, um you know, I've obviously heard you talk about this already and it sounds very, you know, promising. I do wanna, I feel like, poke at this a little bit more. and Maybe if you dive deeper, I'll find some holes at it. But, you know, it just seems like, you know, maybe in bulk operations, you know, retrieving um data with this claim check somehow. I feel like maybe are you gonna be back in the same spot as storing data on the row itself you know um so like if you just had the json blob in that or whatever the content in that event and if you have to rehydrate this thing with some claim check from another table or s3 bucket in real time i mean are you just back at step one okay

00:23:01.52
Chris Morrell
Yeah. I mean, it's 100% a trade-off, right? like Essentially, you're trading the though optimizing for the one case for optimizing for the other case.

00:23:13.77
Chris Morrell
And the downside but downside of the like sort of traditional claim check pattern and or just... I mean, the other thing that we could do, like we do with other file system-related things, you know there's lots of tables that...

00:23:27.64
Chris Morrell
that we have that are related to a file where we just have a disk column and a path column, right? And you just load those from the file system as needed.

00:23:34.41
Bogdan Kharchenko
Mm-hmm.

00:23:39.32
Chris Morrell
um And I think claim check the claim check pattern has that same downside, which is every time you need the data, you have to load it from this secondary store.

00:23:51.89
Chris Morrell
um And so the solution that I had proposed before we got on this on this call was the idea that, well, what if we still had a long text column on this table um and we just write the data to the long text column as usual?

00:24:05.98
Bogdan Kharchenko
Mm-hmm.

00:24:13.57
Chris Morrell
um and Then we essentially um you know we implement some sort of trait or attribute or something that you can put on your model to describe or some sort of you'd probably use like a custom cast um to describe which data should be handled in this way.

00:24:36.10
Chris Morrell
and we run a scheduled job every day. um And like if I'm stepping back and thinking about this more as like an open source package and less is just like a solution that we're gonna use, um because the ah the advantage of implementing some sort of open source package is then we can also adopt that as sort of the canonical way to do this in verbs.

00:24:57.06
Chris Morrell
um We could just have a scheduled task that runs a command that essentially discovers every model that has this cast.

00:25:09.28
Chris Morrell
um And sort of like Laravel Scout, you could ah optionally you know define a method that tells this job how to efficiently load the data.

00:25:21.44
Chris Morrell
And you could optionally ah you know implement a method that tells this job when how to do the logic of when things should be moved to like what what I'm calling cold storage.

00:25:23.89
Skyler Katz
Mm-hmm.

00:25:34.87
Chris Morrell
right So by default, maybe it would be any record that's older than seven days or 14 days or 30 days or whatever it is gets moved to cold storage. But like each model could separately define its own internal logic.

00:25:49.55
Chris Morrell
um And that way you might say, okay if this like document hasn't been sealed yet, then it never moves to cold storage. But like the moment that it it has been sealed and is more than 30 days older, a it it moves to cold storage. right And so what moving to cold storage would be is we you know We grab the contents of the record, we write it to S3, and we replace it with some sort of JSON payload that says, you know this is ah this is the the the path, this is the disk, this is the path.

00:26:26.74
Bogdan Kharchenko
Thank you.

00:26:27.61
Chris Morrell
This is like the timestamp that it was last accessed, perhaps. um And ah well, that that's a thing that i I haven't quite figured out. But ah essentially, it would swap it out with a reference to a file in S3, right?

00:26:43.07
Chris Morrell
But then since we're using a custom cast, when you try to access the attribute, Essentially um Eloquent would just say like, okay, well, if the contents is there, just return it.

00:26:57.07
Chris Morrell
So for all the stuff that's sort of in the hot path, the stuff that's in the last 30 days or whatever, that long text column is just going to have the contents and it'll just work like it does right now.

00:27:01.58
Bogdan Kharchenko
Mm-hmm. Mm-hmm.

00:27:10.05
Chris Morrell
um If it's not in ah in the contents, ah you know the the cast can just exchange that claim check for the the value in S3.

00:27:25.75
Chris Morrell
And we could even have drivers. There could be a database driver, an S3 driver, and a DynamoDB driver, who cares, you know like whatever. um And so transparently, you could just access the contents as though it was always there.

00:27:42.16
Chris Morrell
um And for the content that is most likely to be needed, it will just be there. And for the content that has been moved to this like cold storage concept, it's just one additional call to S3, but it's transparent to the application,

00:27:59.45
Chris Morrell
um which ah you know has a lot of appeal to it. The downside is that now... um you know, we're back to those records are always being loaded no matter what.

00:28:17.16
Chris Morrell
So it doesn't solve that first problem that we talked about, right? So it's not a perfect solution to that, but it's an interesting solution to a lot of the other problems that we've faced.

00:28:27.72
Skyler Katz
I mean, what's the downside just getting rid of the column in the database altogether and just storing it in S3. Yeah.

00:28:39.16
Chris Morrell
Right. So just always storing it in S3.

00:28:42.86
Bogdan Kharchenko
But then you would lose,

00:28:43.02
Skyler Katz
yeah I mean, in the context of these, even, in even in the verbs events context, like

00:28:50.17
Skyler Katz
There's a small performance penalty, but it's also, I just don't think it's that small. If you're hosting your application in

00:29:00.60
Chris Morrell
Right.

00:29:00.84
Skyler Katz
AWS and S3, mean, even if you were like on Laravel Forge's new boxes and you were using R2 from Cloudflare, like this stuff is so close to your box that like, I just don't know that there's really that big of a performance penalty to

00:29:00.88
Chris Morrell
We're on Yeah. yeah

00:29:05.08
Bogdan Kharchenko
Thank you.

00:29:20.73
Skyler Katz
pulling stuff that's like not accessed that much. I mean, it needs, these documents are accessed several times over the course of a week. And then probably never again until somebody decides to look back up for like historical purposes at their document.

00:29:36.89
Skyler Katz
And then you just pull it back in.

00:29:37.55
Bogdan Kharchenko
Thank you.

00:29:38.33
Skyler Katz
i don't, I mean, in verbs, like we don't use verb state. I suppose if you were doing mostly state stuff and you needed it to hydrate all of the events every time, like that would,

00:29:51.02
Skyler Katz
Come at a cost.

00:29:53.51
Chris Morrell
Yeah. Now, I think because the like cold storage concept doesn't address that first issue, which is that you're always that you're you're having to like manually handle which columns you're loading, um it doesn't feel like actually a great solution to our problem.

00:30:18.58
Bogdan Kharchenko
I mean,

00:30:18.70
Chris Morrell
I do think that maybe...

00:30:19.69
Bogdan Kharchenko
but

00:30:20.91
Chris Morrell
ah abstracting the that that behind a concept called a claim check, right? Where there is like a um an object that you get back, you know, that like you can pass that object around until you need it. And then the moment you need it, it either has already been loaded or can be loaded dynamically.

00:30:43.68
Chris Morrell
um and And that would provide a way to potentially say like, if you're storing your claim check data in the database instead of s three you could eager load that stuff. Or maybe there's a way to even like load multiple records more efficiently in S3 in a single call.

00:31:00.22
Chris Morrell
um But I do think that maybe just a s sink simple like, okay, here's an S3 bucket that's only for this. It's got restrictive policies on it so that the data can't be um deleted.

00:31:16.41
Chris Morrell
And we just store yeah essentially a UUID that is a reference to that file. It could be option.

00:31:29.85
Chris Morrell
the best option

00:31:31.79
Skyler Katz
mean, I've never used it, but S3 has like a way to query S3 files, like in a SQL like syntax. And I wonder if the file was stored in a certain way that you could just like pull them back.

00:31:40.87
Chris Morrell
and okay

00:31:49.89
Skyler Katz
Um, which would be interesting.

00:31:50.33
Chris Morrell
Yeah.

00:31:54.37
Bogdan Kharchenko
Yeah, I mean, so one thing that kind of ah strikes to me is like, for example, obviously the file exists only in s three like you kind of lose some other, you trade off something else, right? Like you can't do like JSON queries, for example, on that column, even if you've had some data hydrated, like you can't do search, text search, for example, of whatever document.

00:32:21.72
Bogdan Kharchenko
um And you know one thing as you guys were discussing this that came to my mind is, you know and maybe Chris, you touched up on this a little bit. you know If there was a table called claim check that had some sort of long text column,

00:32:37.17
Bogdan Kharchenko
and it had a UUID or an ID or whatever, that that is the thing that you've exchanged somewhere else, right? So there is this like relationship and you know that kind of solves that issue of having to limit or ah explicitly say which columns you want to fetch from like your main model.

00:32:58.73
Bogdan Kharchenko
because the content is just in another table, but it ultimately will end up in some sort of S3 bucket. But still there's like a direct reference to that column in this table.

00:33:14.96
Bogdan Kharchenko
And you don't have to basically query everything all at once. And you know if you do have the warm data, you could you know maybe do some sort of searching or operations.

00:33:26.28
Skyler Katz
could put the warm data in a claim check table. And then that table takes old records that haven't been claimed access in a while and dumps them off to s three with a, with a reference.

00:33:40.53
Skyler Katz
And, um,

00:33:42.42
Bogdan Kharchenko
Well, that's basically what I'm describing.

00:33:43.54
Skyler Katz
And then like a custom, yeah, a customer relationship where if you're eager loading, we just then fetch the content from S3 and push it into the back into the model when you're grabbing them.

00:33:57.74
Bogdan Kharchenko
It almost doesn't even have to be in the model, right? It just, you know, there has, doesn't even have to be a column.

00:34:01.82
Chris Morrell
This is a relationship.

00:34:03.21
Bogdan Kharchenko
It's just a relationship, right? There's just this claim check and whatever it is, you know, you can change a cast to it or whatever you want to do.

00:34:05.91
Skyler Katz
Yeah.

00:34:14.37
Bogdan Kharchenko
I don't know.

00:34:14.64
Chris Morrell
I do like that.

00:34:14.69
Bogdan Kharchenko
I think that, I think that's a happy medium.

00:34:16.63
Chris Morrell
I like offloading that data to the database also solves like the, because the the problem with, or like a comp the most complicated piece of this like cold storage idea was having to auto discover all the places where you needed to look.

00:34:29.83
Bogdan Kharchenko
Mm-hmm.

00:34:31.29
Chris Morrell
But if you just had all that data in one table, then that that scheduled task could just run every day. and just grab everything that hasn't been accessed in more than 30 days or whatever, you know you just configure it and just offload all of that to s three you know And that could be a pure that could be a totally optional process. right You could choose to do it or not. right You could turn that off and just keep the stuff in the database if that's what you wanted.

00:35:01.19
Chris Morrell
Or if you were concerned about the size of your your database tables, you could you could offload some of the data at S3 as you needed to.

00:35:09.77
Skyler Katz
Yeah. I mean, there's a like storing data that's rarely accessed in S3 is going to be cheaper than storing it in RDS by fractions of a penny per hour, but yeah.

00:35:19.40
Bogdan Kharchenko
Yeah, I mean, i think there's also, yeah, I think there's also like, you know, ah this the the savings in data storage will probably be offset by like access reads and writes to S3.

00:35:22.54
Chris Morrell
ah Sure.

00:35:33.77
Bogdan Kharchenko
I don't know how much it costs, but it probably is also very minimal. But, you know, I think like the solution that we're kind of like, I think going after is, ah you know, dealing with large data, right? And maybe when you scale up to 10 gigabytes in 20, 30, 50, 100 gigabytes of data, then it actually makes sense.

00:35:53.84
Bogdan Kharchenko
But like, if you only have like, you know, one gigabyte of data or a hundred megabytes, which even that is a lot. um You know, I think just keeping in a database, like if somebody was like, here's this package and you have to use S3 and I'm like, oh man, that's like another thing I got to maintain.

00:36:10.59
Bogdan Kharchenko
But if I just optionally can say like, oh yeah, well, this is kind of where big blob data is stored. And I'm okay with the cost because have two users on my applications, me and my wife.

00:36:21.74
Bogdan Kharchenko
you know But I think that's super appealing um you know just to have a specialized table to hold some of this bigger size data.

00:36:32.79
Chris Morrell
Yeah. I mean, that's another thing that like is pretty common in this pattern is um just having a data size threshold.

00:36:43.59
Chris Morrell
So that would be another that could be another thing configuration option that we could offer and and tweak for ourselves where it's like, Yeah, if if the size is over X, don't even write it to the database.

00:36:56.48
Chris Morrell
Just go straight to S3 because we don't want to put a gig of data in ah in a blob column.

00:37:02.37
Bogdan Kharchenko
Yep.

00:37:03.55
Chris Morrell
You know what I mean? um But as long as it's under some number of kilobytes of data, write it to the database first and then archive it to S3 later once it's it hasn't been accessed for a while. like we could we could also implement something like that.

00:37:23.06
Chris Morrell
And that would give us sort of the best of both worlds. You could essentially write whatever amount of data to one of these things and just know that it's just gonna be sort of transparently handled for you.

00:37:38.16
Bogdan Kharchenko
I like it.

00:37:40.19
Skyler Katz
Ship it.

00:37:41.25
Chris Morrell
There we go, we solved it.

00:37:42.51
Bogdan Kharchenko
Chris already wrote the code.

00:37:42.84
Chris Morrell
This is a rare occurrence, 37 minutes, and we just solved the problem.

00:37:48.87
Skyler Katz
Yeah, it is.

00:37:49.19
Bogdan Kharchenko
Yeah, I mean, it's all, you know, i think on paper looks good. i know that, you know, actually implementing this and working with this long term is going to prove to be challenging no matter what. I'm sure that there's probably, you know, table partitioning that we should think about or maybe as you're writing this package, like how to automatically partition by date.

00:38:09.81
Bogdan Kharchenko
ah you know I don't know if it's necessary, but it could be worth exploring because you know if you say to me, like yeah, here's this magical table that's going to store millions of records, and then you know at some point it's going to come to a grinding halt the same way some of the other tables that we've been dealing with.

00:38:28.74
Bogdan Kharchenko
um you know I don't know. It has to be, you know i think, time proven. But I think it's it's a good premise of doing this. And obviously we didn't end invent anything.

00:38:39.05
Bogdan Kharchenko
Like you mentioned earlier, you know this is like a common pattern for ah storing data in like long-term storage, cold storage with this claim check concept.

00:38:49.12
Chris Morrell
Yeah.

00:38:50.10
Skyler Katz
I mean, there was, there's one other thing that like when we were upgrading our database, the person for Berkona was like, you can, in my SQL, like you can make a shadow column, like the column is there, but when you do select star, it just never returns it.

00:39:08.49
Skyler Katz
You have to like explicitly say, like, give me the content column.

00:39:09.06
Chris Morrell
Yeah.

00:39:10.42
Bogdan Kharchenko
Oh, that's a good idea. Yeah.

00:39:14.00
Skyler Katz
Yeah. Which could also be an approach for some of these tables where, and then we just have a ah query scope that says like with big content or whatever that adds the select star comma select content to the query.

00:39:25.94
Chris Morrell
Right.

00:39:26.13
Bogdan Kharchenko
yeah

00:39:27.58
Chris Morrell
Mm-hmm.

00:39:31.14
Chris Morrell
who

00:39:32.24
Skyler Katz
Like that's, an option instead of having to remember to do all of them. i don't know. Or like an attribute that, don't know, makes a second query to the table if you're trying to access the content column, but it's not there.

00:39:47.37
Skyler Katz
um It's like a potential approach.

00:39:49.18
Bogdan Kharchenko
I mean, I really like that simple solution, Skyler, but I think it's not over-engineered enough. This is the problem.

00:39:54.93
Skyler Katz
Well, I mean, the downside to this is that it's opaque and you're like, what like why why do I have to say with content?

00:39:59.33
Bogdan Kharchenko
Yes.

00:40:03.12
Skyler Katz
Like, do you still get that same problem of like, I did a select star, but it's not giving me back all of the data that I'm expecting. Yeah.

00:40:10.72
Chris Morrell
Well, I mean, I would go further to say the problem is that it doesn't solve two of the major headaches that were we are specifically dealing with, right?

00:40:18.27
Skyler Katz
yes

00:40:19.78
Chris Morrell
it It does solve the one. It solves the querying problem, but it doesn't solve the size of data when we need to do debugging. And yeah, yeah and and it doesn't like...

00:40:34.25
Chris Morrell
it's still that, I mean, ultimately the data is still just in that table, right? All the problems that are around the data being in that table remain just because it's not being selected by default, like makes it slightly better, but it doesn't actually address the fact that like that data is still there.

00:40:54.40
Chris Morrell
And I mean, for context, you know, so one part of our development process at InterNACHI is we have,

00:40:58.28
Skyler Katz
Thank you.

00:41:01.83
Chris Morrell
ah We have a ah job that runs every day that takes a production database dump, restores it, runs a bunch of sanitization and cleanup on the the data, um like anonymization and stuff like that, and then ah exports that to, you know, dumps that to an S3 bucket that we can then use for local development.

00:41:26.67
Chris Morrell
And... and you know like where we just are When we introduced this feature, our database dumps went from five gigabytes to 40 gigabytes overnight.

00:41:41.17
Chris Morrell
um And that's just like a real frustrating ah experience. you know So that's like another piece of it that I do want to address. you know now now Right now, we're just sort of clearing out the contents of those tables.

00:41:56.82
Chris Morrell
and we've got that that dump size back down to ah even a little bit smaller than what it was before, but it it comes with the trade-off of like now we don't have any of that data um if we're debugging locally.

00:42:10.89
Bogdan Kharchenko
one One thing I wanted to add, Chris, that unfortunately we still deal with sometimes is ah you know what you just said about basically we we kind of don't get that content because we cleared out. But even with this ah mechanism, right we would then have to take an S3 bucket, potentially backup,

00:42:31.76
Bogdan Kharchenko
and sanitize that data too and make sure that it's available for our development environment. So it is another trade off that, you know, right now, at least we can say like, oh yeah, well maybe we should keep, you know, one week worth of data.

00:42:36.28
Chris Morrell
Yeah.

00:42:44.19
Bogdan Kharchenko
Right. And it's, it's all kind of there contained, but if we have a place where we're offloading it to, another bucket, you know we have to clone that bucket as well and you know do that same kind of sanitization process on the data um you know so that way it's available for like local debugging and development.

00:43:03.73
Bogdan Kharchenko
And that's just another challenge that would...

00:43:06.87
Skyler Katz
I mean, we could do something similar to what we do with Stripe, where we have the read-only key for pulling in the Stripe billing stuff, where if we try to write when we're in local, it'll just throw an exception because it's like that's...

00:43:13.93
Bogdan Kharchenko
Hmm.

00:43:23.27
Skyler Katz
you're trying to write test data to the prod, but and but you're able to like read the prod transaction history or whatever. So we could have a read only production key to S3. That's like, all right, well here's the claim check, like read key.

00:43:37.70
Skyler Katz
And in local, like it can read from there, but if it, but it would write to the, it can't write to it.

00:43:42.85
Chris Morrell
But I can't write to it.

00:43:44.36
Bogdan Kharchenko
yeah

00:43:45.00
Skyler Katz
We can't overwrite the production stuff. I mean, I think like at least with Stripe, when we're pulling in customer stuff, it tries to find it in our dev instance. And if it can't, it reads from the prod instance. So

00:44:02.53
Chris Morrell
Yeah, I mean, the other the other thing is if we do this sort of like two phase, first it goes to the database and then it moves to s three you know, essentially then it's only an issue if you're trying to access data that got moved to S3, which in a lot of cases theoretically wouldn't happen because like,

00:44:03.66
Skyler Katz
yeah.

00:44:06.67
Bogdan Kharchenko
Mm-hmm.

00:44:19.22
Bogdan Kharchenko
true

00:44:24.63
Chris Morrell
its content from from years ago that you know you're not likely to to need for local debugging. right So I do like the idea of like optionally supporting some sort of like you know, alternate read key for the claim checks so that in your local environment, you could set it up so that it's like, yeah, if you don't have it locally, there's a way to access a claim check, a production claim check locally.

00:44:42.65
Bogdan Kharchenko
Thank you.

00:44:52.56
Chris Morrell
But I don't even think you would need that most of the time because the data would just be in the database.

00:44:56.39
Skyler Katz
I mean, that's like special to our instance. I don't know that many people that just use their production database dumps as their local environment.

00:44:59.44
Chris Morrell
Sure.

00:45:06.38
Chris Morrell
I mean, i think that more people than want to admit do.

00:45:11.64
Skyler Katz
It's fair.

00:45:12.49
Bogdan Kharchenko
It's true.

00:45:12.53
Chris Morrell
I don't know many people who are who publicly say on the internet that they do it. But, ah

00:45:18.54
Bogdan Kharchenko
Yeah.

00:45:20.12
Chris Morrell
yeah.

00:45:21.51
Bogdan Kharchenko
It is pretty interesting. I will say one other thing I noticed the other day, i was on Twitter and I think Tim McDonald posted that they migrated like six billion records in their ClickHouse database, like in their staging environment or something like that.

00:45:36.23
Bogdan Kharchenko
And I was just like, man, that's a lot of data, right? And, you know, and I'm not saying let's go use ClickHouse, but I think it's worth maybe investigating

00:45:39.29
Chris Morrell
Yeah.

00:45:44.71
Bogdan Kharchenko
if some of this type of data could be suitable in you know some of these column store tables. um Because it seems like that, you know like Nightwatch, for example, itself stores a lot of data and they're constantly sorting it, querying it by time.

00:46:03.17
Bogdan Kharchenko
um you know Like, show me this exception. you know or list of exceptions by these timeframes. So I don't know if that's a potential solution or another you know ah can of worms potentially.

00:46:16.78
Bogdan Kharchenko
um But I just thought that I was like, man, that is a lot of data as having us just gone through oh ah me a minor migration in comparison. um

00:46:27.34
Chris Morrell
Yeah.

00:46:27.100
Bogdan Kharchenko
you know Six billion is a lot of rows. And I suspect that they are also quite heavy. don't know, just some food for thought.

00:46:34.63
Chris Morrell
Yeah. I mean, i i don't, my impression um based on what I know of like ClickHouse and and and similar databases is that they're not, they are not optimized for returning like a single record by ID, right?

00:46:53.87
Bogdan Kharchenko
Mm-hmm.

00:46:54.26
Chris Morrell
there They're there for like very efficiently doing aggregate queries on data. um And so it's kind of like a trade off, a different trade off. I don't know that it's the right, that is the right solution.

00:47:08.26
Chris Morrell
I do think that there's probably, I mean, what was the one that you brought up um the other day?

00:47:13.70
Bogdan Kharchenko
parquette um But I believe, yeah, but I believe from what I understand that S3 is built on top of Parquet.

00:47:14.81
Chris Morrell
yeah yeah, the Apache project.

00:47:22.52
Bogdan Kharchenko
And I think Skylar, what you were referring to as far as that SQL language, I think that is that Parquet whatever's querying language.

00:47:22.79
Chris Morrell
Yeah.

00:47:30.02
Bogdan Kharchenko
And I could be totally wrong, obviously. do your own research. But um you know I've just heard that name in various contexts of dealing with data store and it's like S3 and all these R2 buckets that just all seem to be wrappers on top of that project.

00:47:43.37
Chris Morrell
yeah Yeah, I mean, I wouldn't be surprised.

00:47:49.94
Skyler Katz
I mean, ah well, I was going to change the topic little bit.

00:47:51.03
Chris Morrell
go ahead, Skylar.

00:47:55.72
Bogdan Kharchenko
Let's do it.

00:47:56.36
Chris Morrell
Oh, i well, all I was going to say is, um I mean, I wouldn't be surprised if like S3 is their own custom custom thing, but that like, if we wanted to run our own version of this database, we could, but I am happy to just use S3 and not, you know, not run our own, you know, bespoke database that's for these these large chunks of of content. You know what i mean?

00:48:26.37
Skyler Katz
I mean, this is this is not this would not be helpful for verbs events, but the things that we're running into are like we're storing you know but the base 64 encoded basically like signature ah doodle canvas drawings.

00:48:44.96
Skyler Katz
And like maybe we should be storing those on S3 later.

00:48:49.53
Bogdan Kharchenko
Immediately.

00:48:49.65
Skyler Katz
as PNGs to begin with.

00:48:49.73
Chris Morrell
Right. Yeah.

00:48:50.77
Bogdan Kharchenko
Yeah.

00:48:51.65
Skyler Katz
um And then these like documents that are HTML in nature, like maybe they, maybe they also just should have been stored in S3 as HTML and loaded like lot of

00:48:51.83
Chris Morrell
yeah

00:49:07.61
Skyler Katz
loaded client side or, or still just like pulled in with file, get contents. Like maybe these particular use cases aren't actually the database isn't the right place for them.

00:49:21.10
Chris Morrell
Yeah. Right.

00:49:21.59
Skyler Katz
Uh, It doesn't solve the verbs events thing other than we then wouldn't be storing the document in verbs events. And in these tables, we would store a reference to the document in the verbs event because we'd have to push it to S3 first, but then we lose versioning of these documents unless you turned on versioning in S3.

00:49:42.76
Chris Morrell
right

00:49:46.35
Skyler Katz
So,

00:49:48.93
Skyler Katz
so

00:49:49.86
Chris Morrell
Right. Yeah.

00:49:51.47
Bogdan Kharchenko
Thank you.

00:49:52.99
Chris Morrell
yeah Yeah, I mean... i think that... ah

00:49:59.41
Chris Morrell
I think that ultimately... That is probably true. I mean, I think that the reason that we're storing right now, the reason that we decided to store these base 64 encoded pings the way we are is because we are fundamentally interacting with them as base 64 data that go that they gets used in the JavaScript component instead of like loaded as a file.

00:50:29.90
Chris Morrell
Um, so i under So I think that that's why the approach was the way that it was. But like in hindsight, I think that there's a good argument for, yeah, these five or eight tables that have these like big these columns that just have a ton of data in them, like they should probably all be offloaded from the database you know in some way.

00:50:54.60
Chris Morrell
ah Or at the very least offload into their own table in the database so that it's like, or tables, maybe one per. I don't know exactly what the solution in each case is, but yeah, it's like probably something that mostly looks like a file should be stored in a place that's mostly made for things that look like files, you know?

00:51:07.06
Skyler Katz
Yeah.

00:51:21.58
Bogdan Kharchenko
Yeah, I agree. I feel like, yeah, there are, you know, um you know I always hear sometimes like on the internet, it'd be like, oh yeah, like we have like 20 megabytes of images in our table. And I'm like, why would you ever do that?

00:51:34.63
Bogdan Kharchenko
And then, you know, just looking back at what I did last week or two years ago or whatever, we do the same thing. It's just the scale hasn't caught up to us. And now it has, I suppose.

00:51:42.87
Chris Morrell
Yeah.

00:51:44.49
Bogdan Kharchenko
And we're kind of like, oh yeah, of course. why would Why would we ever do that? Yeah.

00:51:48.55
Skyler Katz
And then we're just like, oh, another program that needs these canvas signatures.

00:51:51.64
Bogdan Kharchenko
Yep.

00:51:51.87
Skyler Katz
Let's just create another table so that each table doesn't look too big.

00:51:53.33
Bogdan Kharchenko
Ship it, ship it. Yeah.

00:51:56.85
Chris Morrell
Yeah. I mean, some, know, sometimes those just the decisions that you have to make in the moment. And, and ultimately, I mean, a lot of these things are not really problems. Um, you know,

00:52:07.01
Bogdan Kharchenko
It's true.

00:52:08.06
Skyler Katz
There are only problems when I open table plus and I'm trying to look at something.

00:52:11.75
Bogdan Kharchenko
Well, that's because they're using TablePlus.

00:52:11.94
Chris Morrell
Yes.

00:52:13.92
Skyler Katz
Quirious is also just slow and spins on this.

00:52:14.43
Bogdan Kharchenko
Yeah. Yeah. yeah

00:52:17.97
Chris Morrell
No. Yeah. that They were problems. they They only became... you know This is the first time when we hit a real problem, which was when we were doing these big these big queries where we're processing thousands of records and we're loading you know an entire document's worth of content in each record.

00:52:40.90
Chris Morrell
um And we were able to address that. right you know like All of this is... mostly manageable through just the the solutions that we already have but i do think that there's a i do think that there is a better option that like we solve this once and then we just have a solution to this type of problem um And arguably, we're at the point where, okay, we've hit this enough times.

00:53:10.20
Chris Morrell
And like I said, it comes up often enough ah in the Verbs Discord, like this question of, what do I do with an event that has like an avatar changed event?

00:53:23.69
Chris Morrell
You know, like, what do I do with that? or And that's that's even worse than like, ah I mean, that's even easier because it's not that much data. But like a you know, video,

00:53:35.33
Chris Morrell
queued for a processing event where you've got like a 30 gig 4K video, right?

00:53:35.38
Bogdan Kharchenko
Yeah.

00:53:43.70
Chris Morrell
That you're trying to to add to some data pipeline, but you want to run it through an event source system, like, Ferb should have a solution to that problem. um we're actively having this problem in verbs for ourselves.

00:54:00.03
Chris Morrell
You know, we've we've had to think creatively about how to to deal with ah events and files in the past already. And this is like a case where we've we're dealing with it even more.

00:54:11.96
Chris Morrell
So it just feels like, okay, we've hit this point where a better solution is warranted. And I like this. I like this like sort of two phase, uh, claim check concept where it goes to the database first and then gets pushed off the S3 as, um, as appropriate.

00:54:33.20
Chris Morrell
And you kind of, but the The thing that I'm not certain about is it is it better to like hide that whole process behind a cast? Or do you just have a cast that returns a claim check object and make the consuming code have to be sort of like quote unquote claim check aware, right? Where it's like you get a claim check and you have to explicitly exchange it so that like you kind of understand in the consuming code what's happening?

00:55:02.54
Chris Morrell
Or do we do that transparently?

00:55:04.42
Skyler Katz
I think this is where I was, I was referencing a custom relationship because if we had a custom has one relationship, we could in the relationship logic, we can say, well, did is the attribute in the database column or is it a claim check?

00:55:14.94
Bogdan Kharchenko
I the

00:55:21.84
Skyler Katz
And if it's a claim check, then fetch it from S3 and just return back the content. Yeah.

00:55:29.90
Chris Morrell
Mm-hmm.

00:55:30.37
Skyler Katz
like in the, in the custom relationship when it's doing its matching. Um,

00:55:35.63
Chris Morrell
Well, would you do that? Or you could also potentially do something where, like, in your custom relationship, it it, like, tries to load all the data, and then if it's not there, it actually, like, inserts the data back into the database. Yeah.

00:55:56.65
Skyler Katz
Yeah, i mean I think that is also something that you could do because then it's accessed again.

00:55:56.99
Bogdan Kharchenko
And

00:56:01.76
Skyler Katz
So it would insert it back in and it would go back the hot path.

00:56:02.01
Chris Morrell
Right, so then it goes back into the hot path and and eventually gets moved off. Yeah.

00:56:06.98
Skyler Katz
um And it all just happens transparently.

00:56:08.94
Chris Morrell
Yeah, that's interesting.

00:56:12.72
Skyler Katz
I feel like all these file system calls have to get like wrapped in you know retries and all sorts of stuff that...

00:56:19.66
Chris Morrell
Yes.

00:56:20.34
Skyler Katz
like

00:56:21.33
Chris Morrell
Yeah.

00:56:22.37
Skyler Katz
S3 is notoriously, well, all file systems are notoriously just like flaky in,

00:56:27.89
Chris Morrell
Right.

00:56:29.39
Bogdan Kharchenko
This is the other unfortunate downside with some of this.

00:56:29.71
Chris Morrell
Right.

00:56:32.23
Bogdan Kharchenko
It's like when you insert it a database, you almost have like a guarantee it's there, right? Like a very high chance, especially if you do like a transaction and, you know, but like when you're pushing stuff into S3, now it's going over the wire, even if it's, you know, located in the same data center, like things happen.

00:56:42.62
Chris Morrell
Yeah.

00:56:51.49
Bogdan Kharchenko
So that's just certainly something to be aware of. Um,

00:56:55.100
Chris Morrell
Yeah. And I do think that that's like, that's another argument for if this was an open source package, like we could collectively, because I'm certain that there are other people out there who have the same problem.

00:57:07.65
Chris Morrell
Like we could collectively improve the resilience of that.

00:57:09.37
Bogdan Kharchenko
Thank you.

00:57:12.59
Chris Morrell
um to make it less likely. But it is it is true. I mean, you're, you know, fundamentally writing to a database or writing to it's like there's a chance for to for either of those to go wrong.

00:57:24.83
Chris Morrell
But yeah, with transactions and and like,

00:57:25.60
Skyler Katz
them.

00:57:28.97
Chris Morrell
you know, all of the stuff that we have built around interacting with the database in Laravel, like you you have certain guarantees that we would have to make sure we, we were like getting the same assurance with another solution.

00:57:46.26
Skyler Katz
yeah

00:57:47.68
Chris Morrell
Does S3 do any, like, is there any way to do, like, checksums or fingerprinting on S3 or, like, some sort of data verification?

00:57:47.72
Bogdan Kharchenko
I like it.

00:57:56.28
Skyler Katz
You can't, well, you can pass tags, like you can pass keys um metadata in.

00:58:03.49
Chris Morrell
h

00:58:03.98
Skyler Katz
And so you can then get that metadata back out um when you're fetching an object. I don't know with Laravel's, just with the file system adapter, if you're going to get all of information.

00:58:17.26
Chris Morrell
Mm-hmm. Mm-hmm.

00:58:17.84
Skyler Katz
in one go. Well, and I actually am not sure that you can get it in one go. I'm pretty sure it's like a get metadata call, even with the, with the SDK itself.

00:58:29.51
Bogdan Kharchenko
Yeah, but i you if it's a background job, there's not that much load, right? If it's just offloading and you're just storing, making sure that the thing was stored and the checksum matches, I guess on the retrieval, that could be an additional call.

00:58:44.72
Chris Morrell
I mean, if we wanted to go crazy, we could always like write it to S3 and then read it from S3. And before we delete the data from the database, like it's running in a background process.

00:58:58.92
Bogdan Kharchenko
Right.

00:58:59.02
Chris Morrell
So we could choose to be inefficient there for data integrity considerations.

00:58:59.68
Bogdan Kharchenko
Thank you.

00:59:06.82
Chris Morrell
um And there's also like you know building in encryption at rest. you know That would be... relatively straightforward to do in a situation system like this. So that would be really nice.

00:59:20.51
Chris Morrell
Yeah. I don't know. I like that. I like that as an approach.

00:59:25.19
Skyler Katz
Yeah, it seems, it seems interesting. And, and I mean, in our case of dumping out the database, like having the claim check table that has a handful of days of of, big content, and then the rest of the records are all just paths, keeps the content, keeps the whole database size smaller.

00:59:50.93
Chris Morrell
Yeah.

00:59:52.34
Chris Morrell
Well, and ironically, we've now solved the problem ah enough for ourselves that we don't necessarily need to go and implement this solution now.

00:59:59.77
Bogdan Kharchenko
Thank you.

01:00:01.51
Chris Morrell
But ah like ah ah I like that we came to a good place with it. It feels it feels like a good solution.

01:00:09.80
Skyler Katz
ah Chris, you have any train train trips coming up? You're just going train-induced development of ah a new package?

01:00:17.58
Chris Morrell
Yeah, no no long train trips on on the horizon, unfortunately.

01:00:22.76
Bogdan Kharchenko
Me and Skylar are going to to sponsor you and send you off to, don't know, Alaska on the train or something.

01:00:28.35
Chris Morrell
Oh, God.

01:00:28.62
Skyler Katz
Yeah.

01:00:30.02
Chris Morrell
What a nightmare.

01:00:30.98
Skyler Katz
but Well, you know, Bogdan's going to be gone at the convention this week. And so, but you know, we're not doing anything. We're just ah

01:00:36.50
Chris Morrell
God.

01:00:39.83
Skyler Katz
Bogdan's working hard and then we'll just work on claim check.

01:00:39.85
Chris Morrell
oh god We'll just noodle around.

01:00:42.99
Bogdan Kharchenko
Yeah, let's do

01:00:44.40
Chris Morrell
Yeah, there you go.

01:00:44.70
Bogdan Kharchenko
Yeah, I like it.

01:00:46.22
Chris Morrell
Oh, man. All right. Well, yeah, I like it.

01:00:47.49
Bogdan Kharchenko
Awesome, I think this was cool. ah I think we came to a good conclusion. I think, you know, I feel like, you know, aside from like coming to a conclusion, I feel like a lot of, all of us have recognized all the trade-offs, right?

01:00:59.38
Bogdan Kharchenko
Because all of this stuff has trade-offs, whether you're inserting directly in the table, offloading somewhere and,

01:01:00.03
Chris Morrell
Yeah.

01:01:06.44
Bogdan Kharchenko
So on and so forth. So I don't know. For me, I feel like pretty good about, you know, seeing the reality of what this would potentially look like if we go about it.

01:01:17.33
Chris Morrell
Yeah.

01:01:19.73
Chris Morrell
I mean, and also just to be clear, Claude, and I quote, says, and no where did I put that? ah It's a genuine, genuinely, a genuine innovation opportunity right here that we've got.

01:01:35.71
Skyler Katz
yeah

01:01:35.83
Bogdan Kharchenko
You're absolutely right, Chris.

01:01:39.99
Bogdan Kharchenko
Yes, do it.

01:01:40.48
Chris Morrell
oh God.

01:01:42.77
Skyler Katz
Yeah, I'm sure Claude could just write all this code in like, you know, an afternoon.

01:01:43.06
Chris Morrell
All right.

01:01:46.24
Chris Morrell
Oh, my God.

01:01:47.71
Skyler Katz
It'd be fine. but Just look, you know, verbs 1.0 hasn't been, hasn't been tagged yet.

01:01:49.02
Chris Morrell
Yeah, easy peasy. ah

01:01:54.94
Skyler Katz
And it's, you know, you have an opportunity here to get claim check in.

01:02:00.60
Chris Morrell
Just sick clawed on it.

01:02:02.21
Skyler Katz
Exactly.

01:02:02.33
Chris Morrell
let's let's Let's introduce a poorly written buggy ah version of ClamCheck.

01:02:02.76
Bogdan Kharchenko
Yeah.

01:02:09.56
Chris Morrell
ah All right.

01:02:10.84
Bogdan Kharchenko
v zero ah Yeah, so I just had my upstate PHP meetup in Greenville, South Carolina, October 9th.

01:02:11.15
Chris Morrell
um Well, ah before we stop, Bogdan, you had your meetup recently. How was that?

01:02:27.72
Bogdan Kharchenko
So we're recording, I believe it's October 14th.

01:02:30.98
Chris Morrell
It's the 14th. Yeah.

01:02:32.11
Bogdan Kharchenko
Yep, and it was awesome, man. we had We had five speakers scheduled to talk, including I was gonna do a presentation and I did. One of the speakers, unfortunately, was called out sick, but still we have a pretty jam-packed event and we had over 20 people show up.

01:02:48.53
Bogdan Kharchenko
um And i don't know, I love doing these meetups and i like talking to people. I'm like, I mentioned this to you guys before, I'm like always surprised how many new faces I see.

01:03:01.09
Bogdan Kharchenko
ah You know, people are they're just like willing to connect. And, you know, i know not of everybody is able to come out to all these things regularly. But, you know, ah as far as like the new guys, I mean, they just keep showing up and it's super awesome. And then after some time, they tend to be regulars. So don't know. I'm like, ah I'm excited the fact that I started this and I'm excited that people are still coming and sending me text messages the next day and saying, dude, I had such a great time keeping it, keep it going.

01:03:30.39
Bogdan Kharchenko
um So it was ah it was a really great event.

01:03:33.05
Chris Morrell
And then there's PHP ex-Atlanta now, right? You went down for that?

01:03:37.65
Bogdan Kharchenko
Yeah, so Jace runs PHPX Atlanta, and you know I went down there maybe three weeks ago, and you know he is also trying hard to organize and you know finding a permanent venue spot. I think that's one of the challenges that Jace is having right now is he has to swap to different places.

01:03:57.37
Bogdan Kharchenko
um But you know he also has a pretty good turnout, and you know people are you know getting together. There's a ah a bunch of old-timers, it seems like, in the Atlanta. PHP meetup, you know, people who have been attending the Atlanta PHP meetup for like 15 years, you know, 20 years, they have like a pretty long history um of people showing up. But yeah, i mean, it's super awesome to, you know, go down there and hang out with them as well.

01:04:25.55
Bogdan Kharchenko
And, you know, it's relatively close to me. And I'm, I told Jace, I'm going to come out to all the events that you have because, you know, I just want to support him, make sure that, you know,

01:04:36.77
Bogdan Kharchenko
you know Because you know when when you show up, that's actual support. You know i mean? And I feel like a lot of people get hung up on like sponsorships or all this other stuff.

01:04:40.81
Chris Morrell
Yeah. Yeah.

01:04:46.71
Bogdan Kharchenko
But like if nobody shows up and you have a fully sponsored event, you know ah you know you don't it sucks. You know i mean?

01:04:54.94
Chris Morrell
yeah

01:04:55.16
Bogdan Kharchenko
But so if you have people showing up, even if you're losing money on it, I feel like it's a good time. So yeah, I'm excited to visit Atlanta for all the meetups there too.

01:05:07.16
Chris Morrell
And Skylar, do we have another PHPX St. Louis on the horizon or is that slowed down for a little while?

01:05:13.32
Skyler Katz
ah It slowed down for the summer and then ah Mark Binder or my co-organizer was just moving, but we we've been texting about ah the next meetup.

01:05:24.48
Skyler Katz
I think we're going to do like a November, um like a happy hour um meet and greet to hang out and regroup, s see where where people want to to travel.

01:05:29.74
Chris Morrell
Nice.

01:05:37.09
Skyler Katz
The St. Louis region is is quite large and a lot of people are coming from from all over the region. And so we're trying to find find a good central location to to meet up with.

01:05:47.66
Chris Morrell
Yeah, i came back from Laracon like all energized, ready to do another another meetup and then life got in the way and I still haven't scheduled another PHPX Philly.

01:05:58.51
Chris Morrell
ah went up to New York for for the PHP NYC, PHPX NYC last event and that was really fun.

01:05:58.80
Bogdan Kharchenko
well

01:06:06.53
Chris Morrell
ah But i'm I'm hoping to get another PHP Philly started soon.

01:06:11.87
Bogdan Kharchenko
Yeah, I mean, i was gonna say, first off, I wanna come to the next PHP event sometime.

01:06:12.18
Chris Morrell
were going say, Bogdan?

01:06:19.46
Bogdan Kharchenko
And I was actually talking to Steven Fox. He was visiting the meetup I just had and we were kind of low key talking about crashing the PHP NYC meetup one of these days, just gonna show up unannounced.

01:06:28.100
Chris Morrell
That'd

01:06:31.29
Bogdan Kharchenko
So watch out.

01:06:32.06
Chris Morrell
be awesome.

01:06:33.09
Bogdan Kharchenko
um But, ah yeah, I don't know, Chris, if you're the only one pushing the meetup, but you need to find a co-host who can ah do this scheduling for you, you know, and then you just have to show up, um you know, or do some of that legwork. Because I know it's hard for, you know, to get motivated ah to to to schedule, put something on a calendar.

01:06:53.97
Chris Morrell
Yeah, we've got a couple people who are are involved um that I probably should sort of invite to to be a little bit more involved in that piece since it's such a challenge for me.

01:07:04.82
Chris Morrell
um And I'm actually, you know, um I'm talking with Matt Stauffer. He wants to come up and and ah join for one of the meetups, maybe do a talk. And Ian Landsman is planning to come down at some point for one of the events too.

01:07:16.01
Bogdan Kharchenko
Thank you.

01:07:17.95
Chris Morrell
So we've got a couple of like, ah bigger names in the industry who folks would want to come see that I'd love to have do something at at one of our next events. So,

01:07:30.25
Chris Morrell
If you're in the Philly area, ah you know keep an eye out. and you know Just in general, if you're you know if you're listening to this an hour plus in, ah you will definitely enjoy going to um ah local meetup.

01:07:48.80
Chris Morrell
So if you don't already you're not already a part of one you know take a look php x.world uh there's the laravel.com slash meetups i think it is um there are a couple others that i can't think about off the top my head that have a good list of of events and uh yeah i mean like bogdan said if you can show up for one of these events it means a lot to the organizers and it's going to be fantastic for you but um don't know.

01:08:20.32
Chris Morrell
ah It's just been so much fun to to be involved in like these events. and I love to it. Yeah.

01:08:31.47
Bogdan Kharchenko
Yeah, same. it's It's a rush, honestly. I think after the meetup, I couldn't fall asleep for like two hours because I was just pumped. But, you know, I just know a lot of people had that same energy.

01:08:40.26
Chris Morrell
yeah

01:08:44.15
Bogdan Kharchenko
You know, they were just, you know, chit-chatting and, you know, getting to know each other. But... Yeah, I highly encourage anybody to, even if it's not a PHP meetup or JavaScript, whatever you're into, just go do it. I think the the in-person energy is worthwhile.

01:09:02.79
Skyler Katz
Yes, they're tons of fun.

01:09:04.83
Chris Morrell
Yeah. All right. Well, with that, ah this has been fun. I'm glad that once again, we have just solved solve the problem. a Case closed.

01:09:16.04
Chris Morrell
No further action needed. And we could just move on.

01:09:19.100
Skyler Katz
Yeah. Just feed this transcript to Claude. Let's go.

01:09:25.41
Chris Morrell
There you go.

01:09:26.37
Bogdan Kharchenko
All right, Chris. See you later.

01:09:26.70
Chris Morrell
ah All right. Thanks, guys.

01:09:29.57
Skyler Katz
See ya.

More episodes

Chapters

Creators and Guests

What is Over Engineered?