Postgres FM

Nikolay and Michael discuss data types in PostgreSQL — including system types, choosing between types, types provided by extensions, and more. 
 
Here are some links to some things they mentioned:

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is brought to you by:
With special thanks to:

Creators & Guests

Host
Michael Christofides
Founder of pgMustard
Host
Nikolay Samokhvalov
Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

[00:00:00] Michael: Hello and welcome to Postgres FM Week share about all things Postgres qr. I'm Michael, founder of PG Mustard. This is my co-host Nikolai, founder of Postgres ai. Hey Ola. How's it going?

[00:00:09] Nikolay: Hi Michael. All good. Uh, how are you? Um, so what's the topic today?

[00:00:14] Michael: the topic is data types.

[00:00:16] Nikolay: what kind of data types

[00:00:18] Michael: we are gonna try and keep this to half an hour, so we're gonna very much have to do a, uh, 1 0 1 intro to data types. But I think, uh, you've got some interesting, things you wanna go into first. Right?

[00:00:30] Nikolay: Yeah, well, so, so da data types is a huge topic and we probably should just, try to provide overview and some entry points as usual, because, uh, for podcasts it's impossible. If, if you want to, uh, squeeze it into 30 minutes, it's impossible to cover all the topics. POS has huge, uh, number of.

Data types supported and extensions bring even more and more data types, obviously.

[00:00:55] Michael: Yeah, I've heard it described as one of Postgres main strengths. One of the biggest reasons to choose Postgres, which I completely agree with. did you wanna talk about system types to begin with?

[00:01:04] Nikolay: Yeah. Yeah. So before we go, like really wide, we should discuss, this is my idea. It was my idea. Let's discuss data types. Each post always has. Because they're internal. And I encourage, touch some data types, even if you just start, uh, working with Postgres, uh, recent episode about with tips for beginners and, in a related article I posted, it mentioned, C T I D data app.

Which is quite useful to understand that how M V C C, uh, multi version concurrency control works in Postgres and, uh, to start understanding, uh, versions of a row and like to feel how vacuum works basically, and so on. And you just, this is just a hidden column, in any table.

And, uh, it provides you two numbers. So it's like composite type it, provide to numbers, which tell, you location of a So page number, starting from, I guess one. Or I always forget one or zero. So page number and set inside the page. And the interesting, uh, trick, uh, is I, is I usually say you can, uh, ex extract one of, parts.

If you just convert it to point first and then address first or second part of this. Value. It's just a trick which you can use to do some, I don't know, like some statistics to learn how your data is currently organized. And of course, if you perform some updates since your delete so, or to VA vacuum or manual vacuum, , processes.

The table, of course physical legal changes and some roles can migrate to different pages. And this is one of the system types which is worth, Knowing and using from time to time.

[00:02:58] Michael: Yeah, I really liked your tips on using this to get an understanding for how things work, at the disc level. So understanding that. topples are row versions and that they can move and that they get a new C T I D when they've moved. Like if, if they've been updated, even if it's a, like a no up type update, they can move That was really good. The first time I actually came across this was, I did a bunch, like I did a whole glossary for explain. and all the different operations and fields you could see and explain. And I came across a T I D scan and I was like, what is this? And it was so interesting to read up on it.

It is generally only used well. It's, it's like an internal thing. you can, use it, but it's generally a bad idea because tubs can move. So, yeah, that was my first, experience with it.

[00:03:44] Nikolay: Right. Also, there are a few more data types you probably want to at least know they exist and, and I'm sure at some point, , they will be useful for any, even like backend engineer, not just for DBAs. They, of course, all of these things are implementation details and it's like kind of internals, but it's very useful to know about them, for example, L ss n It is a data type called pg l ss n and it's kind of our moment in time.

So it, it always grows, grows, grows, grows. And uh, interesting thing to remember. If you have two SSNs, you can subtract one from each another and you will have a delta measured in bytes. So it's really easy just to use operator minus. , and that's it. of course there is, some function per d for something.

But anyway, like delta of sense, uh, is uh, it's log sequence number, it's position in wall and it's constantly progressing. of course there is another, value exceed transaction id. You can also play with it. Uh, and it's like regular. Integer. , so, , you can work with it as well, but LS n is very interesting so you can understand how much data you produce in terms of wall data.

all rights go to wall, obviously. Also vacuum can write to wall and, , shift l s n as well. and my, my final thing from me is, , a pair of useful data types also internals. O I d and uh, rec class. So each, , database object has O I D, it's like object ID inside database, and, There are some OIGs, which are, , hard coded in Postgres code. For example, when new, , system table, a system view or new data type in core is added, has assigned, O I D. Which is just written source code. And even there is a function, there is a script in helper script, which help, , Postgres developers, Postgres hackers to find unused o ideas that can be used.

So there are some gaps sometimes. So O I D uh, for example, if you create a table, it, it immediately has. O I D assigned and you can select it from system view called PG class. So you select PG class where, real name equals your table name and you can find O I D. And also there is a type called rec class which corresponds to records in PG class and all tables and indexes are there also views, materialized views.

They all are there. And interesting trick is that you can take o i d and just convert it to regular table name or index name, just, uh, saying Colin, red class. This is interesting. This is the shortest way to get, table name from O i d and vice versa, of course, so you can convert it back to o i d.

So it's, it's useful. Sometimes to deal with some catalogs and to find, , for example, if you have a very wide row for some table, you know, there's toast who had episode about toast. When you have toast, you sometimes need to convert table name to o Toast table has o i d of main table inside its stable name.

So you need to play with it and understanding that it's so easy to convert to O I G and back to, Relation. Name So relation, it's kind of overused term, but almost anything is a relation. I mean, tables relation, even index is relation. They all go to PG class, so now you can convert back a forth, just use remembering type names.

And the, the most difficult is to remember rec reg class. I will, I tend to forget it, but I use it quite a lot as well.

[00:07:38] Michael: I came across it recently, , when I was hosting the PGS Seql Friday event and somebody, I think it was Diane Faye, used it in their query for PG stat statements as a really neat way of casting. So actually casting is the important word there, isn't it? The colon, colon you mentioned is used to cast from one day to type to another.

, so it's really cool, really powerful.

[00:08:00] Nikolay: Yeah, I ended post good season because standard wise, say cost. Cast something, something. I never use it because it's too bloated. It's better to just write colon. Colon, because I definitely know I'm not going to migrate from Postgres to anywhere.

[00:08:14] Michael: Nice. In the interest of time, should we move on?

[00:08:18] Nikolay: Yeah. And let's, talk about basic atomic, , data types, like numbers and texts and dates and times and so on.

Where to start, numbers or text.

[00:08:30] Michael: I think text because I thought this was quite like a nail when I first, started learning about Postgres. The advice I read felt like really sensible and I got on board with it. And then the more I learned, the more I realized it's like a little bit more complex. So I feel like this might be one of the more interesting ones.

So the, the reason that I think it's interesting is unlike a lot of relational databases, Postgres has a text type that and not, and, and a lot of the other databases rely on

character. Yeah, exactly. Exactly. and character limits are really common in a lot of, when people are using Oracle SQL Server and, and others.

And I think there's a few, guides out there and a few, like even the Wiki, in fact I was gonna mention this later, but the don't do this, Page on the Wiki is brilliant for data types, so I'll, I'll link that up. But one of their advices is generally to use text and they go into good reasons as to when it may or may not make sense.

but yeah, how do you

feel about that?

[00:09:31] Nikolay: Is it the same Wiki page which says, don't use money?

[00:09:35] Michael: Yep. And don't use timestamp Use times stamp tz.

[00:09:38] Nikolay: Don't use money. Sounds funny actually. I, I, yeah. Well, in inside Postgres, I tend to follow this advice, but, uh, outside, I, I prefer not to. So it's, it's sounds like communist, you know. but uh, it's for text. Uh, first of all, post supports char as well, like a standard, , so standard type type support and you and, but with some additional v can be used without limits as well.

You can say V and it'll behave similar to text and on stack overflow. There is a question where I think. Peter Eiser commented that there is very, very small difference in terms of performance because there is some additional code. Well, , I have mixed opinion here because first of all, of course, it's good, to care about performance, but it's also good to care about, the quality of data.

And when we talk about quality of data, we need the constraints. And Char offers such a, such constraint. And actually, if you define Char a hundred, and then next day you decide, oh, I need 200, 200 characters. It's just an altar. And it'll be quick. It'll

[00:10:47] Michael: it's no locks to, even if you've got a lot of data in there.

[00:10:51] Nikolay: from hundred to 200. It's obvious. We, we don't need to scan table.

Right.

[00:10:57] Michael: Well, in theory, but not always. Things that are in theory possible are true in

[00:11:02] Nikolay: I recently checked, I was surprised it's not scanning table Well, maybe I'm wrong actually. It's, but it's easy to test. You generate a huge table. You start observing logs, measure time, and so on. And you, you see if it scans table or not. I suspect in one way. It won't, if

you increase it.

[00:11:19] Michael: we could spend the rest of this episode on this specific topic. I remember a really good blog post on

[00:11:24] Nikolay: Maybe it's

[00:11:25] Michael: brand. Well, brand Brander did a really good in-depth blog post on this, uh, pardon me,

[00:11:32] Nikolay: How many years ago?

[00:11:34] Michael: only

[00:11:35] Nikolay: Depends on version, probably. I don't know. But, uh, I, I recently, I, I was explaining someone that, uh, it's so. As you say, and I wanted to, as usual, when I say something, I, I, first of all, I need to double check myself. So I do experiment, always experiment. So just check it and see.

And I needed also to demonstrate, with real experiments. So I made experiment and I was surprised that it didn't, uh, From, it was very fast in this way, like from a hundred to 200, increasing this limit. So again, okay, it's better not to memorize these things, but to know how to check these things. It's always good to check and it's always also good to have some, testing system, which, uh, will not let you to deploy.

Bad change, which will block table for long.

[00:12:27] Michael: True.

[00:12:27] Nikolay: Right. So it's, we discussed that with our database experiments, testing and so on also. So it's that area. , well, okay to use text, but, , my question is how are you going to control the le length? Many people tend to control it on application side and, uh, many years ago, I was very, um, I was criticizing them a lot because I was saying, you know, look, only a database can control constraints, , reliably.

Because application, like soon you can have several applications and you need to copy paste this logic in different languages probably, right? It's not good database can control this and so on, but they had reasons, uh, for example, painful, uh, schema changes because I get it. And for example, if you tell me you are going to use.

Even with recent support, I think in possibly 13 or something, it was added. You can alter it to. Saying, I want one more value, Inam, I will say, okay, what if tomorrow you'll decide to remove one value? And you know, your table doesn't have it anymore and it's terrible. So instead of inam, probably it's better to use, just check constraints and know how to cook them.

you do it in two phases. First, you define chicken constraint, uh, with not valid flag. Uh, after each. Some people think it doesn't work if you say not valid, but it works for new values. So all inserter and updates will already will be checked for this check and straight, even if it's marked as not valid, but not valid means, uh, we skip checking existing data and then you now in different transaction you say alter, table alter constraint.

Uh, validate, validate, constraint. And in this case, you don't block anyone. And it's so, it's so good. And in this case, you can, create another constraint with this approach and drop, uh, old constraint. And this is how you migrate to different set of values to be checked. So I prefer this recipe instead of using inam.

And similar here, well, honestly also use text in many cases, but it's not good when you want to store email and then you forget. In, in your application, you forget to check it. And then someone inserts one gigabyte of data because Postgres supports after one gigabyte of, uh, inside one value. It's, it's huge.

Right?

[00:14:52] Michael: So I question, can you check constraint for the length? Then why? Why is it

[00:14:56] Nikolay: Exactly. Yeah. You can, and, uh, altering them probably is easier because like it can be full automated. you just define in two phases. You define new constraint, drop all along. That's it.

Good about this approach is that, you can work with any data type, even with Jss o you can take J ss o and say, I have check constraint. This j ss o always must have. This key, well, you can build now even, uh, Jason schema around it. And I think supervised guys did something like they have extension probably for kind of Jason schema to, put some constraints.

You know, sometimes you want. Again for data quality. You want, if you work with Jason, you still want to have it like semi-structured. You want to, to have some keys existing or some values inside keys or some array objects. I mean, and, and so, and, and, uh, in this case you can define some check constraints that will prohibit inserting data.

You don't want to be inserted and I mean, this recipe can work with any data that this is, the, good side of this recipe.

[00:16:00] Michael: Yeah,

[00:16:01] Nikolay: We know how to deploy without blocking.

[00:16:05] Michael: I think I agree with you on the enums, front. I did want to see a really good, really entertaining talk by Boris maus that I liked. trying to defend the enums, trying to say all the positive things about them, but I think overall I agree with you. Yeah, exactly. It was, it was a valiant effort, I think.

do you want to move on to like numeric types?

[00:16:26] Nikolay: Numeric types. Okay. Yeah, sure. one of, again, one of recent, tips I had, uh, again in beginner's, topic as well. We had it, uh, don't use for primary keys

[00:16:36] Michael: Yep.

[00:16:37] Nikolay: just don't, and, uh, what else? Like numeric, if talk about inte integers, there are like small in, normal, in

[00:16:46] Michael: Well, I think there's some, I think floats are an interesting discussion and I think there's some subtleties here. Like, um,

[00:16:53] Nikolay: Yeah, I wanted just to mention that uh, some people think, uh, big end and small in, I just saw it recently. Uh, it's not standard, well, it's standard small in all that relational databases which want to follow S Q L standard, implement them. So big end, is the choice, but not always. I mean, if you know. Uh, about storage.

We will probably touch the storage topic separately, but, uh, sometimes of course, we want fewer bytes to be spent,

[00:17:23] Michael: So I, I think for keys it makes a lot of sense. and a lot of the advice that's out there, a lot of the good talks that have been given are quite, or not necessarily quite old, but they're pre version 10 of Postgres. So a lot of them mentioned serial and big serial, but now we have identity, so that problem

[00:17:39] Nikolay: I still cannot develop a habit to use it. I still use, uh, b serial, primary, uh, b serial, primary key

[00:17:47] Michael: The only, the only downside of the identity vibe is it's so long, like to, to type it out is like such verbose, uh, syntax. But yeah, I'm trying to get in the habit of using it because.

[00:17:58] Nikolay: it's

[00:17:59] Michael: fewer pain. it's standard, but it also has few like serials, have some like weird complexities. Especially when you, well, yeah, especially if you end up having to migrate, as we discussed in the downtime

[00:18:11] Nikolay: Column has owner, the table has an owner of like, of sequence ownership. This is kind of, yeah. So about float, what I want to hear, I, I recently learned from uh, Peter Gagan on Twitter

who mentioned that. Or they already be published very good, uh, benchmark, , showing how, it's good to live without vacuum.

And they chose Benchmark which relied on Float eight columns, like I think four columns, and, created B three. And Peter Gagan mentioned that, uh, unfortunately B three optimizations he worked on in Postgres 13 and 14. Duplication, right? These optimizations don't work with Floate.

[00:18:58] Michael: Which is a big deal. Right? Because they, because they were looking at the impact on

[00:19:03] Nikolay: who needs flow at all? Well, maybe some people,

[00:19:07] Michael: Well, so the general, the general advice in the Wiki is don't use it, use numeric, uh, with precision.

[00:19:14] Nikolay: don't use money, don't use floate. Similar,

[00:19:18] Michael: Y Yes, and, and they give. The thing I really like about that wiki is they do go into when should you, like what are the exceptions? I think it's a very, very smart way of doing it, but in general, if you've got any doubt, just don't unless you know exactly what you're doing. The, the optimization by Peter reminded me also that recently.

I think it was Postgres 15, may maybe 14, added a bunch of op like optimizations around sorting specific data types. And I think the more common or the more standard data type, the more popular data type you choose, the more likely it is to gain, to benefit from some of these optimizations, whether that's in indexing or sorting or something.

I

[00:19:56] Nikolay: Or vice versa. Because, for example, if you store your Js o or okay, if you store some vectors inside text, column and expect a good performance for some algorithms, uh,

for arbitrary neighbor, uh, nearest neighbors, search a n n. Right. In this case you probably will find out that, , there are extensions which, provide better, , better functionality and performance.

[00:20:26] Michael: Yeah. I think we should talk about advanced types in a moment. Just before we do, do you have anything you wanted to add on? Like we did a whole, you did a whole episode on u UIDs, so I think we can probably skip

[00:20:36] Nikolay: Yeah. Let's skip it. Yeah.

[00:20:39] Michael: anything on booleans or, or dates and times.

[00:20:42] Nikolay: Well, bullion is is not one beat. It's it's one bite.

[00:20:45] Michael: Yeah,

[00:20:45] Nikolay: Unexpectedly. . Bigger than, just one beat.

[00:20:50] Michael: Dates and times. The general advice is

[00:20:52] Nikolay: Timestamp, tz, and pay attention to ranges. Learn ranges. Ranges are great.

Right.

[00:20:57] Michael: shout.

[00:20:59] Nikolay: Ranges are great. They, they have indexes also. So if you have beginning and ending points in time for your, like, data, consider ranges. They will also, take care of data quality. You will never have beginning, which is in future compared to ending point.

[00:21:19] Michael: Yeah.

really awesome feature.

[00:21:22] Nikolay: Yeah. And of course, timestamp is, is that, uh, timestamp of time zone is better and just prefer using it. That's it. So I, I also, I think I recently was, , productive in terms of tips and one of the tip was to block authors and consultants who explain, Some postpositive stuff to other people, and I saw, I asked, and it was, uh, it, it resonated.

I saw many good feedback and I mean, people like this idea, let's stop using inte integer four for primary care. So let's always use integer eight and develop, like, help people develop a habit. To use in the eight and same for timestamp. Let's stop using timestamp. Let's use timestamp T that like this is like good things for those who write blog posts or develop some, I don't know, like educational courses and so on.

[00:22:16] Michael: I agree. And I would add using identity over cereal and big cereal.

[00:22:20] Nikolay: This is, this is probably, yes. Well, and in this case you need to advise, uh, use cast instead of colon. Colon and so on and so on. Like, uh, it depends if you like, I don't care. I prefer processes because they're usually more compact and, uh, it just more convenient for me. So I'm going to continue using sequences and big cereal.

But if you, for example, develop a course for students who you don't know, like many of them will probably go to and use Oracle, for example, and, but you use Postgres when you educate. In this case, you need, uh, probably to stick with SQL standard, uh, language syntax, right? In this

[00:22:58] Michael: Yeah.

[00:22:59] Nikolay: S.

[00:23:00] Michael: Well, I, I was, uh, just teasing 'cause I, yeah. 'cause I know you said you would still use serial.

[00:23:04] Nikolay: Yeah. Well, I, in this case today, I see your face. I, I knew you were teasing, but still worth explaining. Let's move on probably

[00:23:12] Michael: yeah. Right. Advanced types, arrays. J ss o.

[00:23:15] Nikolay: first. Yeah. First of all, uh, we're going to break, , first normal form. The first normal form one and F four and F one, I think, canonical. Naming is one and F right in.

Oh,

[00:23:30] Michael: I actually don't know. Is it like one, probably one of 'em is French or something. That's how these acronyms normally get

[00:23:35] Nikolay: so the first normal form says you, you should, you should not, Put something which can be split of things inside the value in your tables, but, , well, text can be split, split two things. And we know in many languages, text and for example, in C it's considered as array as well. Like you can work with it as a, as a with array.

So there is a first letter, second letter with like index zero, index one, and so on. So it's interesting thing. But anyway, Postgres is very advanced, in queries, we can even collapse whole table, whole relation into inside in one value and then unpack it.

Manipulating, but it's a different story As for, uh, complex data types, first of all, arrays. They can be of. Many, um, the underlying that data types, for example, in integer eight array or integer four array or text array or, and so on. So it's, it's great and, but of course you cannot mix. So you cannot have both inter and text.

, values inside, same array. And, uh, things to remember. The, the indexing starts with one, not zero. This is unexpected to many and leads to interesting mistakes. Sometimes. Well, arrays are awesome. I just can say like, I use 'em a lot. Array, underscore a g g function, which, , helps me when I group something, I can Group many things in inside, inside one value. And then I also use parenthesis. And, I can use, like, I, I say I need only first five elements to see, for example, if you select from PG activity and you, you want some queries.

I use, use left function to to cut them, left query. Comma a hundred, for example.

And then I use array underg to aggregate them, and then I additional append and then, , access indexes from one to five. And you see only first five elements quite convenient. , in one single input for one row, you see five. You can also order by inside array e g.

That's, that's unexpected to many inside the aggregation functions maybe.

Count doesn't count here. Sorry for

[00:25:54] Michael: like you've told me that before, but I still like

[00:25:57] Nikolay: super interesting. Yeah. Yeah,

so, so when you aggregate, for example, J J O N A G G, text and array a, g, G, so you can aggregate, and in this case you can say I want specific order for when I aggregate, and this order can be different compared to order by Klaus in your statement. So it's kind of interesting trick, but, what else? Like, we have a race is, it's only the beginning of the topic, right? It's a huge topic. I mean,

complex data types.

[00:26:29] Michael: yeah. I feel like we should skip Jason because we did a whole episode on it and we can point at at that

[00:26:34] Nikolay: Just don't follow advice. Uh, who, those who say prefer with Jss O B or JS O, they're probably not completely, it's not probably good. Very good advice. In some cases, J SS O is much better in terms of performance. So just check, use both, I would say,

[00:26:51] Michael: Yeah. Yeah. Or read about both and see which suture the, I think I've seen way more use cases, I've seen way more people using j b at than I have Jason. I think for good reasons,

like, yeah. Anyway, we've discussed it at length, haven't

[00:27:06] Nikolay: array. And Jason is a way to escape from, uh, entity at value approach, which, uh, is terrible. When you have a lot of data, these joints will be terrible and you can actually, you can have like, kind of, it's not, it'll not be foreign key, but you can put many ideas. Inside one row in this array value and then try to join using Nests function.

But it's, it's kind of interesting and probably you will have better performance, but again. Okay. Jason and Gen B, let's consider H Store and XML as kind of semi duplicated

types. In our case, I would try to, avoid them despite the fact I, I participated in development of EXIM type and functions, uh, many years ago.

I, these days, I would probably, unless, needed, I would try to avoid them. And H Store is very old. It has a. good features, but most of the time you will probably be good with J ss o and it's what other parts of your architecture probably understand And what else? We have also many others. For example, uh, full text search, TSS Vector and TSS query.

They also can com, like they consist of many things. It's ts vector based on the name. You understand like it's, uh, many words. , combined to, , one vector value and, uh, you have, uh, you can create a gene index and have good performance. new thing, PG Vector for very high dimensional vectors, like up to 2000 I think say, and for not strict neighbor, search, but, arbitrary neighbor search for. This, uh, L L M J G P T area of engineering, which is currently very hot. in this case P Vector. But by the way, P Vector is not the best in terms of it. It implements not the best algorithms. So there is now competition, and I know Neon developed something PG bid, right? And, but, uh, p bid is quite new and it doesn't.

It's like it builds index in memory. I know they, they work on making it persistent, this index. So I mean, if you restart post, you need to build an index. Index again. What? So it's very new. I wouldn't say it's production ready yet. But it's very interesting because it's deployments and different algorithm, which probably is considered, one of the best right now.

[00:29:31] Michael: I think we're gonna need to do a whole episode on pg vex.

[00:29:33] Nikolay: do it. Yeah. About

[00:29:35] Michael: the, I, another one we haven't done an episode on that is, it feels rude. Not to mention in this space is post g i s in terms of

[00:29:42] Nikolay: This is, yeah, exactly. And a lot of stuff there. A lot of course. So many data types, many functions, and uh, it's a whole world.

[00:29:51] Michael: yeah. Incredibly impressive and h hugely popular, we've mentioned performance a few times. Is it like, is there anything you wanted to make sure we talked about? I, I feel like that's, that's where some of the choices between these gets interesting. Is there anything that people should be Definitely be aware of.

[00:30:10] Nikolay: Well, I, I don't think we have a lot of time right now, but let's just probably mention a few tips, first of all, documentation.

Right. Wiki is also good and many articles and book posts are good. Probably we'll attach a few articles which are helpful in terms of, , learning about various data types, but also data storage tip related to alignment paging, , Due to alignment, paging Postgres always, need to fill gaps.

So if you put one byte and then you put eight bytes, Postgres will need to add seven zeros between them. So there is an approach, you, you basically need to either, in the signing or descending order, a range. So starting from 1 1, 1, 2, 2, 2, 4, 4, 4. And so bytes, so they pack or vice versa. You have 16.

16, for example, timestamps when 8, 8, 8. And there are articles about it and also, but it's, there are also var data types like text, virtual.

[00:31:16] Michael: Yeah.

[00:31:17] Nikolay: TSS. Vector. Vector from P Vector. I'm, I'm sure it's var because it, uh, allows very huge, like, huge values up to one gigabyte probably as well, involving toast. These values, you should move to the end of the table.

In this case, you'll have. As few zeros, these gaps as possible. And it's called Column Teris. Basically someone on, on, stake overflow named it maybe, or there, I don't know. There are a few articles about it and sometimes you can save a lot of space, but don't over it. I saw people. Immediately love this topic and try to optimize.

And in many cases it's, it looks like premature optimization because okay, you save like 3% of storage, but you spend a lot of engineering time.

[00:32:03] Michael: I read a Post by Braintree that had huge tables that were very badly optimized, and I think they saved. About 10 to

20% from

[00:32:11] Nikolay: That's great. That's great example.

[00:32:13] Michael: Yeah. Rare. And they, and they're Braintree, you know, they're huge.

[00:32:17] Nikolay: Yeah, my, old ad hoc, tool called Postgres, G b a, has a report, which is a very interesting ct, reive, C T E I wrote one day. I, I'm still not a hundred percent sure it'll work in all cases, but so far it works quite good and it can tell you. About potential Optimization in each table so you, you can use it and, try to understand, if it's worth spending some time to rearrange columns. For some people it's unexpected that Postgres doesn't do it itself. It's interesting, but it's, so Postgres will store columns, as you say, as you define in

[00:32:51] Michael: Yeah.

[00:32:52] Nikolay: definition.

[00:32:54] Michael: And something, something I only learned today when I was looking up, I wondered if, , PG Repack supported changing the order, but it doesn't, based on implementation details, it can't really. So it's very interesting. But, you're gonna have to roll this yourself basically if you want

[00:33:07] Nikolay: The, there is potential to have it, but it's not supported yet, unfortunately. Yeah, you need to rebuild the table and deal with it. It's, it's a huge, , task if you want to do it without downtime. It's a huge task. But, as you remember, we mentioned don't use in in four. Primary key because I, I saw it very often in four primary key, which is doomed to have a, a wall being bumped into when you reach 2 billion rows, 2.1 billion rows, roughly right half of, uh, inte four space.

And then, uh, second column is timestamped t that which is 16, bytes. So we definitely know we lost, four bytes because they're just lemon painting and we have zeros there. If we used into eight from very beginning, we wouldn't have this problem with capacity. And we would use the same amount of storage, which is very unexpected, but it's so,

[00:34:02] Michael: I think the argument stands to use innate, even if you were having to pay double the storage, I think there's so much benefit because if you're not, let's say you don't hit 2 billion mos, you haven't got that much data to worry, you haven't got that much storage footprint anyway.

But if you've, if you're worried about the storage footprint, you probably thinking you're gonna have more than 2 billion rows. So that, that would be my argument.

[00:34:23] Nikolay: Yeah. Okay. Maybe one more advice, uh, on your choice and that's it.

[00:34:28] Michael: Oh,

[00:34:30] Nikolay: What do you think we should

mention

[00:34:31] Michael: I see people I see. A common, argument. It's not just for data types, but people that choose to do things in a standard way or a way that will let them convert between databases over you choosing the power of the current database they're using. Thinking, what if we need to migrate one day, like, and I think it's.

It's an easy trap to fall into. It sounds smart, but I've not like any of us that have been around for a while. You don't move databases that often, like it really, existing projects have a lot of momentum and it costs a lot to migrate. There are arguments for doing it. We see some from some very expensive vendors coming to Postgres, but it's rare and it's, basically so much work that I think you're really hurting yourself by.

missing out on the power, especially a Postgres, especially when we're talking about so many of these things that are so, have so many features, so many functions, so much power that it feels like such a waste that you'd be missing out

[00:35:28] Nikolay: It is. Yeah, it's like choosing c plus plus, but staying with C only syntax, like the very basic types and so on, and not using power of c plus plus or some something like you, chose Java, but to think, oh, maybe one day we will migrate to, I don't know, like to something else, and you, you restrict yourself, using only 5% of functionality. It's, it's nonsense.

[00:35:49] Michael: Yeah, I think so. But I'd see people, I see smart people falling into that trap, so I, that's my, my advice would be try to avoid that or

[00:35:56] Nikolay: Don't consider database as something like a, like, like as globes, it's a heart, not gloves. It's, it's a heart of your

system. You are going to, you are not heart. Yeah. You're not going to replace heart easily. So database is is in the center of everything.

[00:36:12] Michael: Yeah. Heart transplants are possible, but they're rare and they go,

[00:36:16] Nikolay: yes. It's not gloves. You can just

change them every day. Yeah, yeah, yeah. Okay, good. Thank you so much.

Oh, we, we should remind, those who listened until this point. Please help, us grow with likes, with comments, with feedback, it's all, it's both helpful for us to understand We are being heard, and also it's helpful, for example, on YouTube if you like. Of course, , due to al algorithms. It helps, our channel grow and, , make audience bigger and make this project more interesting basically.

[00:36:52] Michael: True and requests as well. Feel free to drop

[00:36:55] Nikolay: Yeah. We need more requests.

[00:36:57] Michael: Yep. Yeah.

[00:36:59] Nikolay: Thank you.

[00:36:59] Michael: Thanks everyone.