Postgres FM

Nikolay and Michael discuss copying a database — what the various options are, how to copy large databases, and some pointers on performance.

Show Notes

Here are links to a few things we mentioned: 


What did you like or not like? What should we discuss next time? Let us know by tweeting us on @samokhvalov / @michristofides / @PostgresFM, or by commenting on our Google doc.

If you would like to share this episode, here's a good link (and thank you!)

Postgres FM is brought to you by:

With special thanks to:

Creators & Guests

Michael Christofides
Founder of pgMustard
Nikolay Samokhvalov
Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

028 Copying a database

Michael: [00:00:00] Hello, and welcome to Postgres FM, a weekly show about all things PostgreSQL. I'm Michael, founder of PG Mustard, and this is my co-host Nikolay, founder of Postgres ai. Hi Nikolay, what are we talking about today?

Nikolay: Hi, Michael. This is, uh, this topic is, it's my choice this week. So I chose, copy, how to copy the postgres database let's discuss various options and the pros and cons and how to improve, obstacles, everything, and use cases of, of course.

Michael: Yeah, that was gonna be my first question. When do you see people needing to do this most often?

Nikolay: Right. Well, since we dis, we agreed that we are going to talk about, positive questions. having some, dba ops experience but trying, trying to explain this for, probably wider audience for developers, backend engineers and so on. let's try to think about, use cases from this perspective.

Of course, uh, you, first of all, you need to copy post this database when you provision a new node in [00:01:00] infrastructure, and this is like, Developer friendly, area, right? so you either want, to have a replica standby note, logical or physical, or you want to provision another environment or you want to bring data to from production to non-production and so on many cases.

But also, there are, cases when, we design our. and need, for example, to split a monolith to multiple services. So we need to think about how we will be perform splitting. And when we do split, we probably need to copy all data, right? Because we like it's very good privilege. If we can start from small, from an empty database, a new service.

I mean, usually we need to keep all data to preserve it, so we need to copy. it from old database, from monolith database or some, service database becomes too, too big. And [00:02:00] we see how we can split it as well. So we consider it as like also kind of monolith and splitting. So these are basic use cases.

There are some times cases when we need to bring data from one place to another place, and this is our routine job. And, , of course, I think every engineer should understand the latency and throughput and how to troubleshoot bottlenecks. How to find bottlenecks of the copy process, right? So these are probably the basic classes of use cases, provisioning, splitting.

And some kind of copying just because our application is designed, so for example, we need, it's also kind of pring, right? Everything is pressuring, but even, even if you copy database on your laptop, sometimes in small projects people do this and it has its own, pro this approach. It's also provisioning.

But to ju just to your left laptop, right?

Michael: on, on that note. if we start there, if, if we, I [00:03:00] guess we should ignore for now, worrying about, personally identifiable information and priv privacy and things like that. what do you see on the smaller side before we get into the bit larger databases? On the smaller side, copy and databases around.

I've seen people have a lot of success with just simple pg dump, PG restore, not even multi-threaded. And I've also seen, cause I don't know if we're gonna talk about it again, this being a u good use case for even, create database with template, being even possibly faster.

Nikolay: Thick cloning. Right? Thick cloning. So it's like, data, if you have a hundred gigabytes in you clone clone. You have 200 gigabytes used on disk. Yes. Well, first of all, let's, start with logical, uh, Peter Restore it's logical approach. So we copied data, transforming it to SQL statements. Create table copy or individual inserts.

So sometimes, uh, first of all, I, I would like to say that dump [00:04:00] store are, are quite advanced, programs, very advanced tools. They ha they have a lot of options, a lot. And, uh, also format months. Plain text format under custom and, uh, directory, so-called directory formats, which, support pluralization and filtering.

but this is definitely logical, level. And sometimes people think of the dump is a backup, for example, which is very, um, like I, I, I don't agree with that. Uh, it's a backup. Kind of backup, but it's logical backup. So you can, run it and copy it, of course it's always thick, so because, uh, again, if you load it, you consume same space again, but you will lose the blo if, if you head blo when you dump, restore, you lose it.

So there should not be surprises if, New copy takes less disk space even if you took everything. indexes are are optimized. They are just created. P dump put, puts [00:05:00] them to the end of dump. So they are created after data is already loaded by P Store. if it's plain text format, you can even load it using, , ps.

Also an option, but, , since I'm, I'm usually dealing with, quite large databases, , dozens of terabytes of course. , p dump, , is like for large systems, it's quite first of all dangerous to, if you run it in production, it'll, keep transaction. at a, repeatable read level. It opens transaction at repeatable read levels to to have persistent snapshot always so it dumps tables.

That relate, like the risk correlation in data, so foreign keys won't be broken and so on. That's why repeatable read snapshot is, is needed. and uh, if you do it quite a lot, of course auto vacuum will be affected. . So on large database, uh, my recommendation would be to do it not only on a standby [00:06:00] node, but on a standby node, either with, hosted by feedback off and increased, limit for transactions.

Uh, how, I don't remember the, Nope. Which, uh, Controls how long replication, physical replication can wait until it says, okay, I cannot wait anymore and kills your transaction. Or better, I would, if you have like really large database and want, need to run a p dump for whole database, I would say, have a clone like detached from.

and then temporary clone and cloud, it's easier, right? So you, create a replica probably from Cloud Snapshot, and then you, you just promote it, it detached from, doesn't stream, uh, changes from the primary anymore. It's independent. So you can run dump there and then you dispose it like, Destroying this, and of course, , for throughput, single thread is not enough.

We have a lot of CPUs these days usually, so I would speed it up. But for [00:07:00] small databases, I would say less than 10 gigabytes, definitely it's okay to use. A single threaded approach. A hundred, terabyte, a hundred gigabytes terabyte. It's already time to think about pluralization. And here we come into issues with p dump, restore, because you cannot use pluralization.

If you want to restore immediately, you, you want to restore it, like dump and restore on on the thigh. It's possible if you have single thread approach, so you just use. , you dump and restore immediately, so you don't need additional disk space. It's good. But if you need pluralization to have good throughput and you, you probably want to have good throughput if you deal with real production nodes as well.

Because, because you don't want auto vacuums workers two, to interfere with their work. So probably you want to finish it earlier. So you might want multiple workers. So you. Won't dump Restore, but in this case you need first [00:08:00] dump in, directive format and then restore from it.

Also, restore also supports pluralization dash J. So in this case, , this is some like disadvantage of, , pitch dump Restore. They cannot use pluralization and avoid. creation of this temporary dump, which is solved by third party tools, there is a tool from Dimitri Fontaine called pgcopydb quite new, one of the benefits of which is exactly this can around multiple threads and dump Restore on the fly.

Great. This is, this is for a larger databases, but again, if you have smaller database, it's fine. to use single threat. And you choose, either you want to dump it or, or dump, restore on the fly, right? So, yeah.

Michael: Before we move on from, that, I'd also noticed that both the custom and directory formats are compressed by the default, which is [00:09:00] pretty cool. I didn't realize that.

Nikolay: Yep, yep. compression is there. , it's good. compression, you probably want it, if, you have large databases and sometimes compression is significant. But, they also support filtering, direct reformance supports. I, I always mix customer and director and always check documentation, but, one of these former support filtering.

Small store tool that we have a pair of options, dash l uppercase and lowercase. One of them allows you to see the list of, the content of dump. So you can see, okay, we have table creation for this table here. We have, data for this table here. We have indexes. We have materialized to have like a refresh material to comment and so on.

And then you can using grep, you know, like in comment line. It's easy using grep. You can choose only what you want or remove what you don't want. For example, if you [00:10:00] have hierarchy of motorized use. Sometimes there's an issue with dependencies. So I remember I removed, , the comments to refresh, motorized use and just performed it separately.

Otherwise, you cannot restore. So, this is quite handy if you need partial restoration, and also worth mentioning, , dump pitch dump. By default it'll produce errors but won't. On, on an error

Michael: Oh wow.

Nikolay: yeah, it's, it's quite sometimes dangerous. You, you haven't noticed it restore, I don't remember. Maybe the same, but in general approach as usual, I don't remember.

Right. But in general, you should check the commutation here. Always. Like what's the default behavior? And decide, do you need to stop on error or ignore errors and to do best effort losing some objects and then return because uh, of course in each case it's different, but in my experience, I usually want to stop.

That's why [00:11:00] default error for me, it doesn't usually work properly and I check okay. Again, we have an issue here. I would like to understand there is an issue and don't continue, so I want to stop. There is an option of course for, to change behavior in both pitch dump and PPG store.

Michael: And the docs for this are great. I'll, I'll link them up in the show notes.

Nikolay: docs are great. I would say reference doc, uh, use cases. Usually you want to find them in block posts, more wisdom there in terms of use cases, but reference is perfect of course, and , dash dash help always works. . So, one more thing if of, of course, you, you can perform surgery with pto, right?

You can take what you want, you can combine things from different places. A lot of things can be done. but, of course, if you want, for example, default behavior to stop an error, you should understand that, probably you will need to start from, from from, from beginning, right? So retrial logic, you are responsible for retrial [00:12:00] logic.

You need to either ignore us or to stop on error and then to perform this like dash she, pair of dash shell, dash shell comments and continue and, and so on. It's so it doesn't have good with biologic. But it gives you some, options, as like tooling to, write your scripts as you want. So this is, uh, probably it, uh, about Peter.

One more thing. Interesting for developers probably, this repeatable re transaction, it's interesting that you can, control it and, for example, you can open transaction. Yourself, expert snapshot and then, command P Dam to use your snapshot so you keep transaction open with snapshot already created, preventing auto vacuum to clean up freshly dead doubles as we discussed before, couple of times.

And then using function, uh, PPG expert snapshot, you know, the snapshot name and you. Feed the snapshot name to PG dump in common light. [00:13:00] And this is quite interesting also. And actually I think this is how, pluralization works as well. Workers know exact snapshot name and they will work, Synchronized, right? So we, they deal with the same snapshot. . And this allows you to design some, procedures to move data between places. understanding that, your data is correlated, right? And I wish, developers of logical, uh, replica. Commercial tools, understood it better because some of them don't.

For example, the tool called Click Former Opportunity, I tried to explain this for one of our customers. I tried to explain to this company several times, like, guys, you have issues because your expert data to analytical database, which doesn't have unique keys, if, , initial. performed in a way that further CDC change data capture process.

this [00:14:00] switch is not, is rough. We don't understand these snapshots. we will have duplicates. So the trick is you open logical replication slot, not using sql, but. replication connection. It's kind of separate. Very small language, comments in replication connection.

There you can create the same logical slot and know, not only a lesson, but also snapshot name. And then you can dump using that snapshot name easy. And then when you switch to cdc, you don't, think about duplicates at all. Right? But you can cannot.

Yeah, but you cannot achieve this if you create logical slot using select PG logical slot cre create, I don't remember as usual.

I don't, reem, I'm very bad with remembering, function names and so on. So, and in this case, uh, it's great and exactly like you, I think. In each particular case, understanding this concept of snapshots, and you can synchronize various pauses, workers backends, like [00:15:00] telling them, okay, I, I want to deal with this snapshots.

It's quite powerful, right?

Michael: Yes. Very cool. I was, reading that blog post you shared, from, is it Michael Paia our link up as well.

Nikolay: quite old. Almost 10 years old. But, it's very well explaining. This what I just explained. Create replication, slot logical, and you know, a snapshot name. And then you can pitch, even use pitch dump if there's their snapshot. And that's it.

Michael: So yeah, I guess when, when it comes to a point where PG dump is no longer or not, not great for us, what are our other options?

Nikolay: Well, you mentioned, uh, create database template. in we have this, uh, very old by the way, approach. So we have template, zero template one. created by default. it's like, like, like reference databases. And if you want, all databases to have some, for example, table, always all freshly created databases.

You can put something to template One, it'll be used by [00:16:00] default, but you can also say, directly, explicitly, say, I want new database created based on that database. It'll be thick, cloning, so it'll create copy objects. in regular way, like cp, right? And, of course it'll take time. By the way, we didn't discuss throughput and so on, with logical approach.

it's worth, to understand that, time, you, you cannot say, okay. , like one tabby per hour. It's very hard to say because it very depends. It depends a lot on, uh, the structure. First of all, indexes, a lot of time. So, so the logical copy, if you check, uh, CPU U utilization, uh, dis utilization, it's not, , evenly distributed.

First we copied data, so, so first of course we create, tables and then we copy data. Very IO intensive work, uh, cpu. is like nothing to calculate here at all, but then in [00:17:00] indexes, and this can take a lot of time if you don't have indexes. It's easy, but if you need to create a lot of indexes, because again, logical means, uh, we transform into SQL and then back to database, right?

So it's like, uh, there is uh, like change of state here we had physical state with transformer to sql and then from SQL to physical state again. So when you need to create a lot of indexes, It's CPU intensive work. Of course, maintenance work, maam, is worth checking. Of course. All right.

So maybe tuning and so on. And of course, when we do things, we, for example, if it's one time, it's okay. If not, it's, it's not perfect. But if we know we will be performing dump. Routinely. So we need this procedure to, to be, uh, optimized. First thing to remember, as I mentioned already about the stress on production, we covered that, but then how to improve and how to troubleshoot, uh, throughput, troubleshooting, throughput.

here [00:18:00] bottlenecks is, uh, usual. So like in Linux, we check CPU load. Of course, uh, average, uh, is not enough. We need to be able to understand, each core. 50% on average can be 50 each core, or half of them a hundred percent. And the other half is not doing at all anything. Because if we use Dash J for store, we can, uh, for example, say half of our, a.

course or we can match available course on destination. it's interesting, right? So we just, yeah, we need to check each core and we also need to check, this cloud of course, first of all throughput, uh, in chops of IOPS and, uh, mi mibi per second. And, uh, we, we should know our limits and, understand how far from there and.

Michael: in, in terms of rules of thumb, I saw some advice that if, if you're [00:19:00] restoring to a machine, that that's all you're doing on that machine, the reasonable starting point is, The number of cores as the number of

Nikolay: right. Unlike pjak. For p dump, we, we, if we do it from production note, we should think about a lot, especially if it's the primary and, like the most expensive, uh, resource we have. there probably we should be very careful with Dash J and, control our CPU and first and dis cloud. Definitely because we will, we are going to read a lot, by the way, interesting to understand that if our databases, for example, more than half of it is index.

dumping index indexes is very easy. Unlike restoring dumping index is just definition. It's like one line. with data, yes, we, we read it from this, but indexes we don't read it and it's, it's good. Right? So in terms of dumping, restoring, yes, it's CPU intensive work. You will see, uh, many course are like a hundred percent busy.

So if you are alone on [00:20:00] destination mode Yeah. Dash J match, uh, cpu. Yeah. We sometimes even do this, either cut, uh, broke CPU in four or NU cpu. I don't remember. We just take, number of course and put it to dash J dynamically. If you have eight course okay. It'll be eight workers restoring and, and so on.

Michael: going back to something you mentioned earlier. , dumping and restoring at the same time without having to work, without needing to duplicate the data or without having to have double the space on disk. what are our other options for that?

Nikolay: at logical level, no. Well,

Michael: even at physical, I guess, is that when

Nikolay: physical is very different. Physical level is absolutely different. And, if we talk about, opportunities for backend developers, of course, logical is, looks like more, flexible because many, many things filtering, uh, like control of, uh, throughput and so on.

in a physical level. Imposs, we cannot copy one database, unfortunately, unlike in some other systems. For example, SQL Server,

[00:21:00] we copy whole cluster? Yeah. all databases inside it, unfortunately. So sometimes it matters. there's no way to take only one database because of walls. The walls, they cannot, like, we have single stream of walls and, uh, it's, it's mixing all databases there.

So we cannot take, only one of them. Why do we need walls? when we copy at physical level, it's takes some. . And if you check throughput, usually, I mean, not throughput. If you check resources monitoring, you'll see it's quiet U uniform. there's no like, like, okay, we did IO intensive work.

Now we switched to CPU intensive fork, like in, in restoring. Uh, at logic we've PG restore. at physical level it's just taking off files and bringing them new to new place. But it takes time. And about the speed. I would expect these days, I would say one terabyte per hour is already so, so, probably, you know, like previously I changed, I ch like I, I, I, I decided [00:22:00] okay, 2023.

it's time to say one terabyte per hour. It's not enough already. It's like very moderate, probably. You have not, not modern discs, right? And so on. Probably you don't, have good network. Maybe it depends. You also need to troubleshoot it in a normal Linux way, of course. Uh, for troubleshooting. I think any engineer who, who wants to be, full stack or backend engineer?

Uh, should, uh, uh, read, uh, br Brandon Greg's books, system performance and, uh, and so on. and at least watch on YouTube. Quite old, two parts. talk from Brandon Greg about, performance troubleshooting methodologists. So it gives, Basic understanding. So you, if your network, for example, is not good, all discs are slow either on, source or destination, one terabyte per hour or even less, not good.

Three terabytes per hour is good these days.

Three [00:23:00] terabytes. Yeah. It's methods simple here. Uh, modern n vm e, well, mo should be at least one gigabyte per second, right? For reading and. , maybe two already, right? We have good already disks. and in cloud, if you take e EBS volume and Nutrisystem, or most modern instances, EBS volume will be based on these disks.

It, it should produce you. Of course, you need to pay for IOP and in cloud, right? So it's, it's expensive, but talking about big systems, one traba per hour is already so, for raw physical file copy, uh, over network. so one gigabyte per second. Uh, we have 3,600 seconds, 3,600 sec seconds per hour. Three, three Triplebyte per second.

We should have, right? Maybe more. Even sometimes if, if everything is, is well tuned. Network is good, right? Maybe more. Yeah. And of course there is no parallel, uh, [00:24:00] conflicting, uh, competing, uh, workload. Okay. so this process takes time. And the key here is that, if our P data is one terabyte, 10 terabytes during this time when we start and when we finish, it's not consistent, right?

So we have inconsistent P data, data directory copy. And to fix that, we use POCUS can use, checkpoint and walls to replay. Changes and to reach point of consistency. So if you do it manually, old school, we, we say PPG start, uh, backup, select ppg start backup. We can do it on replica as well, which is good.

And then we copy data directory, not thinking okay, with our sync for example of with SCP, anything, it can feel dangerous, but it's quite reliable. we copy with walls a. In wall, in D wall directory. And then we say PPG stop up. Then we start [00:25:00] POCUS a new place. And wait, we need to wait. It'll take time to reach.

Point of consistency. This is how physical, regular, thick, cloning works. And if you use, uh, automation, POGS has automation for SPG based backup, it'll do basically similar. , right? Taking walls. I remember days, some years ago. Uh, by default it, it didn't bring walls and so many people, including myself, made mistake using PG based back up to create physical copy.

and then trying to start it. But without walls, you cannot start it. It's, it's inconsistent as I just explained. So you needed to specify dash X or something. Some like, additional option. But modern versions of pogs take care of by default. They take walls as well. either additional connection, uh, at posts level or, using, ssh.

So Uhk is, [00:26:00] is, uh, a very good tool, but very big, uh, disadvantage of it. If something goes wrong, goes south, right? We need to start again.

Michael: Yep.

Nikolay: and this is huge disadvantage. And if you provision your nodes regularly on regular basis, for example, uh, standby nodes, if, for example, one node goes down, we need to reprovision it.

It can be automated with Patron, of course, it, first of all, it's stress. Uh, second. Disadvantage, it's stress. Uh, we need to read everything. Discs are very busy. If you do it from the primary node on production, it's stress from primary node on production. And if we did 90% of copy and then something is wrong, some network issue or something, it'll, it'll restart, and, uh, more hours to spend if you have a multi terabyte data.

This disadvantage can be mitigated if you switch to different approach for physical [00:27:00] copy. it's called restore from archive. here we, we should, uh, discuss backups briefly. So dump is not backup. Why? Because it's, uh, single snapshot and that's it. It doesn't give you opportunity. you do it. and then.

So it's like once per day or something. But if a problem of, obviously not if problem occurred later, and the distance between our dump and the problem, it's called, , in ops language. It's called, R P o, store point objective. So this is the amount of data you lose, right? For backups. It's very bad, right?

Michael: I hear the alternative get called point in time recovery quite

Nikolay: right? The so point in time recovery, any modern, such as ques and others, they supported and they usually do two basic things, full inconsistent copy of PPG [00:28:00] data. Why inconsistent? Because it takes time to copy. And second, continuous stream of. So we archive two things and then we restore, first of all, restore pitch data in inconsistent state, and we take all walls.

We need to reach consistency and also to replay to the latest available point. by default or to the point you specified. if you want point in time recovery, this is backup system. DAM is very, very weak. Backup system, pitch dump. I mean, p store. It's flexible, it's good. In many cases it's much better. For example, if you want to take only one database or some tables on it, of course pitch dump is better because taking whole cluster is more expensive.

So, in this case, if you use this tool, usually data is stored in object stores such as S3 or G gcp, G uh, Google

Michael: Cloud platform.


Nikolay: [00:29:00] Storage, GCs or others, uh, Azure Blob storage and so on. And, also it takes care about encrypt. Oh, about compression, compression, encryption as well. But compression, uh, we, we, we need compression data files and ous data are, are shrink to one gigabyte gigabyte file files.

So if you have one terabyte table, by the way, you shouldn't, you should have purchasing, you will see a lot of one gigabyte files in, the data directory in data. Directory base sub then, then you need to navigate to using OITs. So, it compresses it. And also it also shrinks into chunks probably to be able to retry more politely, for example.

I remember we did some work with all developers to improve how it works for Google Cloud because it re logic was working differently compared to S3 there, so we needed to like, you don't want to re to [00:30:00] retry when you upload, when you back up, you want, you don't want to retry whole. File, even if it's compressed, takes only 300 megabytes.

For example, like three times compressed, still returning as a whole, maybe not good. So you need chunks. so everything automated. And then, uh, to join chunks on, the cloud side on this, um, object storage side It gives you opportunity to have reprise, both for safe to, to storage and to retrieve.

In this case, uh, if you use jibe backup, as I said, some issue, you need to restart. But, uh, with JI or Jibe Crest, if you restore from archive, first of all, you don't put stress on any production node. And second, you have rice if it fails at 90. , maybe this is an issue with network, for example. Okay.

We will wait until our cloud provider fixes this issue and we try from this exactly point. [00:31:00] This is more efficient, of course, right? But there is pro one big disadvantage of using these tools, so we should trust them in terms. , yeahs also can, can have issues. for example, corruption. Well, replica notes and standby notes also can have corruption compared to primary.

Primary can have corruption as well. Backups also can have corruption. That's why they try to. Improve everything. I know PCR has quite strong, um, work with, uh, check arms and so on, like to verify everything and so on. So it's interesting topic, but in general, if you want physical, I like, by default, I wouldn't use Jibe.

If I have, uh, proper archives, I would restore from archives. It's, it's also you, you can, perze this work un unlike the Jibe book. Uh, you can say, okay, object storage, uh, like s3, aws, s3. [00:32:00] It's quite good in terms of multiple workers. You can uhs data like using 16 workers, and you see it's good because also you have SSD and vm.

It's also good in terms of fertilization. You have a lot of CPUs. Uh, use 16 co uh, 16 workers, for example. And in this case, if you have good disks, like, big system, you have expensive, you pay for IOPS and so on. Like provision, iops, in this case, uh, I would say try to achieve three terabytes per hour.

This, this is good. This, extract, database from archives should be very fast. And then it's only the question how, how much it'll take for pauses to start to achieve consistency point and to reply additional walls. This is actually it in terms of physical, uh, I could, it's, it's maybe less interesting than Ry store


Michael: Well, it depends, right? I think that's the nice thing about this topic. It's a lot depending on the use [00:33:00] case, depending on exactly what you need. at any given point. It either is useful.

Nikolay: Yeah. You know, you know what, I'm wondering still. I, I wa I'm wondering religion. Several years. I think it's under appreciated the approach when instead of, uh, copying whole inconsistent P data, we take, cloud. Which should take only minutes, even if you have terabyte, dozens of terabytes. and then UD play was to achieve the point you need.

I think it should be like if, uh, VG and Pji BCRs, uh, consider themselves, uh, cloud friendly, they should keep this as an option instead of copying a lot of terabytes. Why we, we can provision this if you don't need to change file. , of course, if we need to change file system, we cannot use, uh, cloud snapshot, but these cloud snapshots.

So in like, they're quite powerful. Sometimes they are really fast, sometimes they have [00:34:00] stress when to create. Uh, I saw under heavy load, if you create snapshot on this COPRODUCTION node. Some, at some point you have issue like a, uh, a couple of minutes. latency, latency spike and everything slows down. So it's interesting how like it's implemented, under hood.

Michael: is this on like a database as a service platform or,

Nikolay: it's, I'm, I'm talking about regular snapshots of, uh, network attached volumes like ABS volume or uh, um, P pd persistent disk on p. , it was on gcp actually. I think such behavior observed. Uh, and sometimes I remember some people reported they could not restore from such snapshot. Uh, but good thing about these cloud snapshots is that, uh, they are incremental.

They are definitely like copy and right. But of course if you want to restore, you create another volume. You pay for this volume, definitely. So for [00:35:00] you, it looks like. Not thin for cloud, it's thin. They save a lot of, of course, a lot of, uh, on resources on hardware. But, uh, last thing probably to touch, uh, this, uh, also topic, close to my home.

it's, um, if you want many snapshots on clothes, on clones, on one machine, perfect for non-production development testing and so on, of course, think loaning, uh, using LVM two or the. is it's, it's very, looks very good because you pay only for one disk and have many clones, which for application behave as independent.

and, database lab engine achieves, uh, exactly this, it automates it providing, uh, C cli API and ui. So it'll take only a second to create snapshot. It'll take only a second or two to create a new clone based on that snapshot. And you can have a, a [00:36:00] powerful, uh, standby node, right? So you, you, you run it using database engine and.

replace walls, for example, on physical level. It works both like interesting that, uh, to deal with think learning, we needed to explore thick learning in very detail. So to, because to provision our node, we also need to bring data initially, and there is physical and logical option and many, many aspects of it.

How to realize how to like can weise on the fly. We cannot using P dump PPG store. PG or P db, we, we have plans to, uh, integrate with it. Not, not yet, but it'll be good, uh, as an option. Or we can do it physically using pg Bret VG from archives and so on. Like a lot of, options to choose from. But then you have, uh, constantly replaying.

Walls at physical level, for example. It's like, perhaps like a replica, but, for your, uh, environments, it looks like dozens of environments. Right. [00:37:00] So it's, it's quite interesting.

Michael: Yeah. I think people that haven't seen it before is pretty magical. When you fir when you see a huge database and you can have your own isolated copy of it, in seconds, it's yeah, it's great. Yeah. Thin cloning for the win.

Nikolay: Think learning and uh uh, we already developed branching. we discussed it, uh, we had a episode about branching. So we have that Tesla engine 4.0, first alpha. We are testing it internally. Soon. I will present it for wider audience to test and, and, uh, it's quite interesting because it indeed, it like similar to gi uh, you can, you can change, you can commit, you can have new snap.

Like idea with comment and you can create new clones from there to share it with co colleagues and so on. So, yeah. But for, for how to copy database topic, I think we covered the practically everything. Logical physical, cloud clones, which ah, okay. Also, [00:38:00] Aurora has thin loans, right. But, for storage, you don't pay it for additional storage, but you need to pay for.

Instance that works with such clone, uh, and finally, local thin clones. Uh, we, we work with, uh, like okay, four big classes.


Michael: I think that is that everything. Anything, any last things you wanted to mention?

Nikolay: Well, uh, test everything before you decide. . Test optimize. Test optimize. Especially these days, like, recently Google Cloud increased, storage prices. We end up, of course, economical situation is not perfect these days. And, uh, I think more people will try to optimize their costs and, how you copy.

how you, use, resources in terms of storage, first of all, and also compute, because if, cloning off one tury takes a day, that's not right. So I think, uh, [00:39:00] engineers, it's good to improve skills in this area, right? To understand various options and how to test, uh, how to troubleshoot and how to make choices so your company doesn't lose money. For nothing, right? Yep.

Michael: Well said.

Nikolay: Yep.

Michael: Wonderful. Well, thanks so much. Thanks everyone, and catch you next week.

Nikolay: See you next week.