A weekly podcast about all things PostgreSQL
Michael: Hello and welcome to Postgres.FM,
a weekly show about
all things PostgreSQL.
I am Michael, founder of pgMustard.
I'm joined as always by Nik, founder of PostgresAI.
Hey, Nik.
Nikolay: Hi, Michael.
How are you?
Michael: Good.
And we're delighted to be joined by Bruce Momjian, who is VP of
Postgres Evangelist at EDB and long-serving Postgres Core Team
member who recently gave a new talk about the missing features
in Postgres.
So welcome Bruce, it's an honour to have you here.
Bruce: Good to be with you.
Michael: Wonderful.
Perhaps we could get started with why this topic, why is it important
or why is it important to you?
Bruce: So I'd love to say that I thought of this topic and I'm
a genius, but no, actually Melanie Plageman came to me in Riga,
PG Europe in the fall.
And she said, we'd really love you to do a new talk at the
PGConf.dev Conference in Vancouver.
I said, okay.
And she said, how about things that aren't in Postgres?
I think, wow, that's a great idea.
So then I guess I was showing these slides to Robert Haas once
I finished them.
And he's like, wow, Melanie got a mention.
I'm like, yeah, I said half the job is figuring out what topic
to do for a talk.
So he said, I'll have to think about that now.
I can get a mention just by thinking of a topic.
I don't even have to do the talk.
So yeah, I thought it was a cool idea.
I never would have thought of it.
There's a couple of my talks where you'll see in the introduction
of title concept from text, because again, I have about 60, 65
talks I think now and it's hard to, 67, it's hard to think of
like topics.
I've been really lucky because I did 2 new ones in the fall of
last year.
1 is this talk, another is the wonderful world of WAL, which
is about the write-ahead log, and then I have a third 1 which
is more of an AI talk.
I already have 2 AI talks.
This is another 1 called building an MCP server using Postgres.
So I got 3 new talks queued up for 2026.
So it's unusual for me.
I write my talks and basically give them for maybe a year around
and then come up with a new 1 because I've been doing it for
30 years.
So on average, I'm making 2 talks
a year because I have 67, but
this year I get 3.
So I was like, wow, this is great.
And Melanie, I have to give her
credit for the topic on this
one.
Michael: That's awesome to hear.
Friend of the show Melanie, she's
been a guest before.
That's good to hear.
Nikolay: Yeah, I have a question.
If my memory doesn't play a game
with me, 20 years ago I remember
you maintained the to-do list.
Bruce: That's right.
Nikolay: Isn't this a to-do list?
Bruce: Is this a to-do list?
I would say no.
The to-do list is basically driven
by what people have asked
for.
It could be a small thing, it could
be a big thing.
This is more of the, I would say,
the big missing features.
And a lot of them are ones that
we had no intention of doing.
So it's sort of like, let's step
back.
Let's look at what big things were
missing.
Let's look at what big things we're
maybe working on.
And then look at some of the big
things that we'll probably never
do.
Nikolay: So it's strategic.
Bruce: Much more strategic.
You're right.
If I drill down to the to-do list,
we'll be here forever.
But again, I think it is valuable
to step back and look at what
is missing and what isn't.
And I even learned something just
by writing the talk because
stepping back far enough and looking
at the missing things that,
so I think I have, what is that?
12 items.
And what's interesting is 11 of
the 12 items relate to performance.
That surprised me.
Michael: If you had to guess beforehand,
what would you thought
more of them would be?
Bruce: I thought it would be basically
missing operational features
or missing security features.
Well, that's TD is that 1.
That's the 1 of the 12, right?
Missing just integration, infrastructure,
some of these big missing
things.
But when I actually wrote them
down, I kind of like, wow.
So 11 of the 12 are basically related
to scaling, either on a
single host, 7 of them, or multiple
hosts, which is another area
we need to work on.
So you got really 2 kind of 11
of the 12 performance, 7 of them
are for single host and 4 of them
are for multi-host.
And again, I just never suspected
that it would be a performance
need and not an operational need.
If I had done this 10 years ago,
I know there would have been
a whole bunch of things that...
Nikolay: I can bring some ideas.
Maybe I should do it in the end,
which are not performance related
and definitely big missing pieces
And they're not included to
your list.
So maybe let's
Bruce: see.
I have...
What's interesting, I've given
this twice, both in January, 1
in Prague, 1 in Brussels, and nobody
really came up, in those
groups, nobody really came up with
anything that was wrong.
The only, I think, feeling I got
was some of the people were
sort of like, well Oracle RAC has
some value in these small use
cases.
And I'm like, yeah.
So there was a little bit of, you
don't have this Oracle thing
or multi-master like, well maybe
there was some value.
So I get it.
Yeah, it's there, but I didn't
get anything that sort of shocked
me.
So I'll be interested to see if
you have anything that was sort
of like,
Nikolay: Sounds good, I will do
it.
Bruce: Yeah, I want to hear this.
Michael: Why don't we do it straight
away?
Bruce: Sure.
Nikolay: Right now, let's discuss
this.
Okay, I'm looking at your list
and I don't see, for example,
synchronous replication improvements
because right now synchronous
replication is not all right.
If you maybe watched Kukushkin's
talk last summer talking about
caveats, and This is a big topic,
for example, when there are
new people coming to Postgres ecosystem
from MySQL, for example,
and 1 of big group is Multigres,
creators of Vitess.
Some words from them, some words
from Kukushkin, and I see synchronous
replication is not all right because
it's based on logs And if
primary already written data, and
then it suddenly restarts,
primary thinks data is written,
standby never accepted.
This is 1 thing.
Another thing is that if you have
logical replication involved,
There is a nightmare to handle
that.
Basically you lose, during failover
with issues, you lose logical
replica.
And looking at other systems, including
MySQL, there is a big
potential to improve synchronous
replication.
Basically some people won't say
names but just watch Kukushkin's
talk.
It's broken right now.
Bruce: So I'm working on the PG 19
release notes.
I do see improvements in both of
those areas in the release notes
that I'm working on.
They're not done yet, it won't
be done for about 2 weeks.
But it'd be interesting if you
could come back and take a look
and see what was done.
So what I didn't, what happens
in that particular example, both
examples, is you're now out of
the 30,000 foot and you're now
getting down.
And there's totally stuff to do.
Nikolay: Sorry, I'm not getting
down.
When people with enterprise and
big cluster experience come,
they say, where is synchronous
replication?
We cannot build proper clusters.
And what I'm telling them, most
people still use asynchronous
replication and they cannot believe.
They cannot believe.
Really?
Bruce: Yes.
How can
Nikolay: they say?
Other systems have proper synchronous
replication with proper
consensus algorithm.
This is another topic.
There are many attempts to bring
consensus algorithm or something
inside Postgres, right?
That's similar to pooler question.
It's always there are many ways,
and so this topic is not about
performance, it's about HA.
Bruce: No, what I'm saying, I'm
saying, Dan, what I'm saying
is I'm trying to look at this from
a very high level to look
at big holes.
What you're actually looking at
is we have the feature but it
doesn't work right or it needs
whatever.
I get that but again if I go down
that level I'll never get done
right.
I'm trying to from the top look
for these big holes and actually
to me not only look at the big
holes, but look at why are they
holes, right?
Because a lot of people who aren't
me or you, who are sitting
watching everything that's happening
all 25 hours a day, right?
They're like, what's the problem?
What's the holdup?
And then the other thing for me
is, by having that 10,000-foot
view, I see a pattern clearer than
I could see when I was down
in the...
I'm not disputing that these things are not feature complete,
I would say.
It's not atypical for us, particularly with partitioning and
logical replication and failover, to take years to get to a feature
complete point.
And I think we are in a feature complete 0.1 whole bunch of areas,
but I tried to stay away from, okay, we're not feature complete
here, we're not feature complete there.
I'm like, what things are we not even like moving in?
And there are a bunch of them that I would say we're just not
engaged at all.
Nikolay: Yeah, I get that.
That's a good description.
And when you're at this highest level, you see a lot of performance
related stuff, right?
Bruce: That's right.
At that level, I see a lot of stuff that we're not even in the
ballpark in.
That we're just scratching the surface.
I'm not saying all the things we've done so far that have been
implemented or done perfectly or feature completely clearly are
not.
But at least I feel we're moving there.
What I see up top is a whole class of things where we're not
even engaging, and we may not need to engage.
It's more of a thought experiment for people to come in and say,
okay, what things are they not even in the ballpark on?
And it's helpful to know, because if you look at the list it's
not a terrible list.
And if you think this is the biggest stuff we don't have, we
must have a lot of stuff, right?
Because I can't think of any other big things except these 12.
There might be 1 or 2 that somebody will think of at some point.
But I think for me it's a great exercise in seeing where we are
at that big level.
Where do we think we need to go 05:10 years out?
That's usually the area, that's the distance I'm looking at,
Not necessarily what comes in 19 or 20.
I think some of the stuff you're looking for it's coming in those
releases
Nikolay: That's good.
Bruce: But what stuff would do we really need to at least evaluate
and say is this a direction we need to go in or not?
Michael: Looking through the list it struck me that there's a
bunch of things that you can currently do, but you need extensions
or a separate tool.
You can use Postgres and get those features, but you need an
extension or another service like PgBouncer or something.
But there's also a bunch that you literally can't do with Postgres,
so you would stick with maybe a proprietary database or just
a different solution completely, maybe a fork or something like
that.
So it felt like there's that kind of 2 categories there where
there are things you can do with Postgres, but you need something
outside of core.
And there are things that you can't do with it.
So is there 1 of those that's more interesting than the other
in terms of bringing into core?
Bruce: That's funny you mentioned that.
I really struggled with that question.
And the audience struggled with that question.
I remember people saying, hey, what about this?
What about that?
And I think the version, these slides are regularly updated.
Like every time I give a talk, I think it's something that, I'm
just looking at 1 of the capitalizations I did wrong.
Anyway, I'll fix that after this talk.
But my point is that you're right.
You have, there's like concentric circles.
So if you're willing, if you're willing to go to a proprietary
fork of Postgres, it's a big circle, right?
And then if you're willing to use extension or external tools,
let's say external tools and then proprietary extensions and
then extensions And then the core, right?
And in fact, maybe I should have a slide for this to illustrate
what that span is.
And in a lot of cases, we will say we have it.
This is just great you asked that question.
Nobody asked it quite the same way you did.
So the question, some of the complaints we have are not that
it's not available as either an extension or an external tool
or a proprietary work of Postgres.
The complaint we have is it's not in core Postgres.
Or that there's an extension, but the extension, because it's
an extension, doesn't work exactly the same way it could possibly
work if it was in core.
Or if it's a third-party tool, the third-party tool is not as
effective or easy to use as it would be if it was integrated,
which is the connection pooling example.
Nikolay: Or there's a choice of them.
Like, why a choice of them?
Bruce: Right, there's a choice of them.
Although the choice of them I kind of like, particularly in the
pgpool, PgBouncer realm, there is actually a reason for that.
But my point is that some of the missing stuff, even something
like column there, right?
You could say we have extensions to do that, but yeah, I guess,
but I'm not really happy with that and I'm not necessarily wedded
to Citus and it has limitations and because it's an extension,
it might not be available.
So you get all these sort of complaints
And it brings you to
a bigger philosophical question
of, does everything need to be
in core, right?
And there's probably not.
I think pgvector is a great example
of something external that
can be developed on its own release
cycle.
And it really doesn't hamper its
ability to be used effectively
with the code.
But when you're starting to talk
column there, you're starting
to talk connection pooling, where
you have problems with authentication
going through the connection pooler.
We've tried to improve that, but
it's still kind of yucky.
You know, logical replication at
DDL, you know, another one's
available as an external proprietary
product but not in the code.
TDE is a classic that's available
in a bunch of external stuff.
It's not available in the code.
So it's sometimes when people complain
or it's missing, what
I tried to focus on was basically
in this talk, is it available
either in Postgres or as an extension
that has no downsides?
That's my litmus test.
Anything that has limitations and
is an extension, limitations
as a third party tool, or obviously
as a proprietary fork, those
are going to be effectively not
available in the community.
And we have to decide, is that
a good place to be, a bad place
to be?
Maybe that's fine.
Michael: I really like the concentric
circles thought and I want
just on the kind of the last 2,
is there also a difference in
your head between contrib extensions
and core?
Like, for example, pg_stat_statements,
a lot of the pg_stat_* views
are on by default when people install
Postgres, but pg_stat_statements
isn't.
So like that's an interesting,
I don't know if that's too semantic
a discussion, but that feels interesting
to me as is that missing
or not really missing?
Bruce: You're right.
That's another concentric circle.
Nikolay: Yeah.
Bruce: Right.
So yeah, see like pgvector, maybe
that would be nice to be in
contrib instead of being external.
But then if we pulled into contrib
and its release cycle becomes
tied to the major releases.
Maybe that's not a good idea, I
don't know.
And when you start to talk about
the cloud vendors will support
some contrib extensions and not
others, and you get people who
pull me aside, hey, why don't you
move this into core, get out
of contrib, And I'm like, what's
the matter with contrib?
And you're like, well, my cloud
vendor doesn't support it.
I'm like, okay, so you're complaining
to me that your cloud vendor
doesn't support something we're already shipping.
Is that, do you think that's gonna be like fruitful?
And they're like, well, why isn't it?
But there's a variety of reasons that stuff is in contrib.
It may be an edge use case.
It may be like pg_stat_statements, creates its own tables and
has its own overhead that we're not sure everyone would want.
So that kind of makes sense out there.
Some of them like an oddball 1 like cube, an edgy case, pg_trgm.
I don't know.
It's hard to say.
Is it better as an extension?
Is it going to be better in core?
You can make a case either way, I would say.
Michael: Nice.
From the list, are there any that are like particularly your
favorites or ones that you would love to see progress on or do
you not like to pick favourites?
Bruce: Yeah, the 2 that I've really felt called to champion are
cluster file encryption, which is called TDE in the industry
and sharding.
Those are the 2 that I have the biggest impact in terms of adoption
and expanding Postgres workloads for me.
The other ones are interesting, but they're more...
I don't see them as expanding the Postgres adoption universe.
That's, I guess, the phrase I would use.
Yeah, The other ones are good, but they're more operational.
It would be nice to have this.
I would be more optimized if I had this other thing.
For me, the sharding and the TDE really take Postgres to a new
level.
I would say in both of those, I've pretty much either failed
or executed poorly in terms of because I've been working, at
least championing those for at least 5 years now.
I don't feel I've made a lot of, I guess I've been waiting for
a groundswell of support because both of them were very hard
to implement as an individual.
I got close with TDE but got stuck on something.
And sharding, I think, it really requires a team.
I don't think I've seen the kind of desire behind that.
I think part of the reason for sharding is that machines keep
getting so big that effectively you're better off just buying
a bigger machine or renting a bigger machine than going down
the sharding route.
Because the sharding route really is very workload specific and
there's an increasing dislike of workload specific things in
an enterprise.
There's a much more push toward just generic compute, generic
solutions, generic, and the sort of, oh, I got... with special
chips on the drive, which filter stuff, and oh, I got this special,
whatever, special hardware does something.
It just, the enterprise focus is a lot less on hardware now,
a lot less on infrastructure, a lot more on solutions.
Maybe That might be good, I'm not saying good or bad, but I think
sharding is...
As people needed sharding, systems have increased in size, really
pretty much in lockstep to what they needed.
So the number of people who actually need it is limited, and
the number of people who are willing to put in the effort to
architect it is also limited.
Nikolay: Yeah, I agree with you that hardware available right
now is so huge.
Recently, I saw a cluster which had default vacuum settings,
self-managed.
So usually like RDS or others, they tune, half-tune it.
But it was the default of the vacuum settings, 3 workers only,
nothing tuned, and it was 20 terabytes and somehow surviving.
It was insane, absolutely.
So how is it possible?
But at the same time, RDS and CloudSQL and others, they have
hard limit.
I know they work on it.
And Aurora has more 64 terabytes.
And these days to collect that amount of data is not rare already.
Even 1 person surrounded by AIs can collect a lot of data.
This is 1 thing.
And another thing is that there are hard problems.
For example, lightweight lock manager problem, which was solved
in Postgres 18.
There are other problems as well when you grow to some heights
you encounter with Some people would not like to spend time firefighting,
so Sharding is really needed.
But I guess, what's your take?
For me, it's really hard to find consensus how to achieve it.
Also, sharding topic usually triggers topics like internal connection
pooler, for example, because they are adjacent.
If you talk about routing and so on, you think about also auto
failover, you think about connection pooling.
These topics usually come together in my head.
But there are other ways to do sharding.
There are many ways.
What's your take?
Will it ever be implemented in core or no?
Bruce: I've always been the opinion, I do have a sharding talk
on my website, but I've always been the opinion that sharding
has to be really developed organically within Postgres, and it
really has to be built on effectively partitioning foreign data
wrappers and parallelism.
I don't think we're going to have the appetite to create a whole
new architecture for sharding.
And as every year that goes by,
it becomes clearer that, yeah,
we just can't.
So I think we can get closer than
we are now with little impact.
I think if we, for read-only sharding,
I think we want to do
read-write sharding, then we have
to have a global LockManager,
global snapshot manager.
It becomes much more complicated.
Yeah, I keep going there and people
who think they need it effectively
don't end up needing it or end
up getting bigger hardware or
they re-architect what they're
doing and then they don't need
it anymore.
So I think that's 1 of the reasons
it hasn't moved forward.
But I do think, I think for the
right only sort of a data warehouse
kind of sharding, I think within
2 years we could have a pretty
good solution in the industry.
But I just haven't seen a lot of,
There's 1 guy in Fujitsu who
was working on it, but that was
about it.
And frankly, it's been pretty dormant
for the past 2 years, and
I haven't had time to work on it
much either.
Nikolay: Yeah.
If you stay on a single primary
cluster, there is a big limitation,
well known for those who achieved
some heights, where you have
200, 300 maybe bytes per second.
We talked about this a lot with
folks who develop sharding systems.
You hit the limits of single threaded
process of logical, physical,
logical, logical, WAL receiver,
basically.
Not WAL receiver, a replay process,
right?
In it, it shows up as a neat and
top, which basically replays
the changes from the primary and
it's a single threaded process.
And I know also somebody from Japan
working on it since 2013,
I think, on and off.
I saw some conference talks and
so on.
This problem is not included to
your list.
It's not a small problem if you
achieve heights.
What do you think about that problem?
Bruce: Yeah, so as I remember,
that was the 1 where they're trying
to create a dependency graph from
the WAL, and therefore identify
which parts of the WAL are parallelizable,
and send those off
to workers.
CPUs do that now with CPU instructions.
They figure out which parts can
be run on different cores or
in parallel cores.
Yeah, you're right.
But we don't
Nikolay: have threading, so, yeah.
Bruce: I don't know.
I don't know if threading is...
Threading's on the list that I
have.
I don't
Nikolay: know if
Bruce: threading I'm not sure threading is a requirement for
that, because we do have parallelism without threading now.
It seems to work fine.
Because you load them into a shared memory queue and the process
just pulls out of there.
I don't know.
I think you're right.
I think that would be getting a paralyzable replay of logical
replication.
Physical doesn't seem to matter too much because it's so fast
that...
Nikolay: Hold on 1 second.
Logical I think already, if there is work maybe in Postgres 18
or something, I remember something for logical to parallelize
it.
Bruce: To parallelize it, okay.
Nikolay: I think so.
Technically you can parallelize it with multiple publication
subscription pairs, multiple slots, but it will be having problems
with foreign keys.
It will be eventually consistent in terms of referential integrity.
I'm talking purely about physical replication.
Bruce: Physical replication, okay.
Nikolay: And this is limit some companies hit, and it's super
painful because there is no escape from it.
You need to do either sharding or vertical split.
You need to achieve some, like a lot of writes basically.
Bruce: I think the reason we don't hear about it a lot in the
community level, and this might be different from the level that
you're working at, is we're definitely a general purpose database.
And not that we can't go to the heights and we keep pushing the
ceiling up higher and higher, but there is a limit to how much
complexity we're willing to add, and potentially unreliability
to Postgres to get up to the super heights.
It's possible that's why I know you specialize in that area.
That's possible we don't hear a lot about it.
That didn't get on my list as something that I've heard a lot
about.
I don't see many email threads really addressing that.
Nikolay: This is so.
But I know, for example, Instacart hit it years ago, during
COVID actually, and there are others who hit it.
Sometimes people hit it and don't notice because, okay, some
replica are lagging a little bit and so on.
Yeah, but I understand it's not a super common problem.
Yeah.
Bruce: We had a, EDB had a customer who hit it on specialized
storage hardware and we identified that it was the storage hardware
that was causing it.
So that was, yeah, that was kind of a get new hardware, get new
storage hardware, your problem will go away.
So that was, I think the answer to that 1.
Nikolay: Okay,
Bruce: yeah.
That's the only case I've heard of replay complaints and lagging
in my work.
Yeah.
Nikolay: But what's your take on threading topic?
Because there was a big impulse a couple of years ago from Heikki
originally, I think.
Bruce: Yeah.
I have a blog entry about it, which kind of from 2018, which
references, I think it might even references Heikki's, Heikki's,
let me see if it references Heikki's email.
No, it doesn't.
I think we're still on the fence on that 1.
We keep chewing away at some of the small things like getting
rid of global variables, I think was 1.
And I think the other thing that's going to bail us out here
is that we now have, it appears as though we never have compiler
support for this.
So instead of having to re-architect all our code, it looks like
there's some, if we can get rid of the global variables, which
is pretty simple, it looks like the compiler will sort of read
local a lot of stuff for us to make it easier without us having
to like really re-architect a lot of the code.
So I'm hoping in that area.
But my understanding is that the threading prototypes don't seem
to gain a whole lot.
They don't, at least from what we found, because it reduces task
switching time.
But again, for example, if you're doing a GUI, you really need
threading because you're updating a whole bunch of stuff on the
screen at once and you have 1 of all these background jobs to
be running in your address space and so forth to update a whole
bunch of very light processes and light statuses.
But when you start to talk about a database, there really isn't
a whole lot of light out there.
Yeah, I guess we could do index lookups with threads.
That would be cool.
Nikolay: Visibility bits maybe?
Bruce: Visibility, yeah.
But the problem is there's a lot of the stuff is really heavy.
For example, when you're trying to paralyze a query, those are
not light, right?
So you might as well just get a process and run it and have its
own everything.
So I think that's what has slowed it down.
Another problem obviously is resiliency.
Right now if a session crashes or runs out of memory or whatever,
we keep running just fine, unless
it's in a critical section.
Whereas with threading, it would
make us less resilient to sessions
that misbehave, and we have to
balance those 2 off.
We know the value of threading,
and then the less resiliency
of it.
I don't really know the answer.
I've been surprised we haven't
done more in this area, but it
appears to not be a huge problem
if I'm not hearing about it
every couple months.
I certainly
Nikolay: am not.
Let's talk about huge problems
related to 4 byte transaction
IDs, which is included to your
list, 8 byte transaction IDs.
And some folks have them for years
already.
And also there is, I saw some preparation
work and so on.
I'm very curious What are your
forecasts and take on this?
But also, which is not included
to your list, is undo log and
redesigning MVCC and these topics
which were quite popular a
few years ago.
There are some efforts, but now
like silence.
Maybe I don't see it, but...
Bruce: No, I think you're right
on the MVCC.
I mean, we had the Zheap effort
by Robert Haas.
That was probably 10 years ago.
And he really poured into it, but
I think the job just got too
large.
And I think you also, once you
start to look at what undo requires,
things get very complicated.
So I'm not saying it's not doable,
but you end up with a lot
of complicated challenges And Oracle
suffered from those challenges
for decades until they finally
figured out how to deal with them.
So what would we gain by having
an undo?
Certainly it would make updates
easier.
I'm not sure it would help with
something like deletes or aborted
inserts.
Nikolay: I think it's deletes.
Let's just update and see.
Bruce: Yeah, just update and be
done with it, exactly.
Michael: It's all the bloat and
vacuum issues, I think, that
people are most excited about getting
rid of.
Bruce: Yeah, I mean, but at the
same time they keep improving
vacuum every release I saw it in
the thing.
You have today's vacuum, you have
the vacuum from 2 years ago,
you have the vacuum 5 years ago.
Which 1 are we complaining about?
Because in fact it has gotten better.
I'd love to just get rid of the
whole thing.
I don't think we Even with an undo,
I'm not sure I'm ever going
to be able to get rid of vacuum.
Nikolay: Because it has different
jobs as well.
Bruce: Right, because of the problem with aborted inserts and
deletes and so forth.
So what is that going to look like?
I don't know.
I don't know.
I'm surprised at how long we've been able to stay with what we
got, what, 30, 40 years ago.
Much more simpler architectural setup.
It's funny, I had a discussion with somebody and I was in Geneva
for the CERN PGDay and somebody came to me and they were talking
about page compression.
Do we do page compression?
And I have a blog entry about it and I showed it to him and he
didn't really, I hoped he would read it from my phone, but he
didn't want to.
I said, why do you want page compression?
He said, it requires less disk space.
I said, this is not the 1990s.
What actual reason?
I said, I understand why it was needed 30 years ago, but I'm
not sure that, because once we compress a page, then obviously
if we do an update and then the new data can't be compressed
as well, then we got a bigger page, you've got to put that somewhere,
and where do we put it, and then how do we deal with the index
entries, and like what are we really gaining once we're all done
all that computation, all that moving the data around, What have
we really gained with the compression?
Now we can use storage compression, right?
You can, there is storage compression.
I think Postgres will run on that.
But again, that pushes the problem down to the storage layer.
I don't know what performance would be like, because frankly,
databases don't like compression of that size.
But I guess my point is that he was asking for a very specific
thing, but when I asked him why he wanted it, he really couldn't
articulate except, okay, it uses less disk space, but I wasn't
sure what the goal was there.
So for
Nikolay: me, compression related to, let's compress data, it's
more related to Column storage and how there you can compress.
Bruce: I told him that.
I said, if you're telling me Column storage, that's a completely
different setup.
And we do have solutions for that.
But again, you're assuming a lot of duplicate data in the same
Column, duplicate data.
And that's a different case than compressing an 8K page on disk.
And it's like, no, no, no, I don't want that.
I want the 1 about the page.
I don't see anybody working on that.
And I'm like, okay.
Nikolay: We don't serve this in
this restaurant.
Bruce: Yeah, we don't serve that.
In terms of 64-bit transaction
IDs, I think it's a little hard
to understand what's going on when
you look at the email threads,
because I think the impetus is
that Postgres Pro has a version
of Postgres that does 64-bit transaction
IDs.
And therefore, you don't have to
freeze and the whole thing there.
But as we talked about earlier,
Nikolay, I think that the Postgres
Pro customers are really at the
very high end throughput requirements
And they're willing to pay for
that.
They're willing to have a more
complex system that does that.
But when we're now in the more
generic workload case, the Russians
have been very willing to give
us the patches to do it.
But there's a resistance in terms
of exactly how to do it.
In a way, it's going to benefit
the high-end users, no question.
And 1 of the things I've learned
in the past year is that 1 of
the reasons that Oracle looks that
way, and a lot of the proprietary
forks look the way they do, is
because they're really selling
to that top 5% of market volume.
Whereas when Postgres is working
on its code, we're trying to
hit that 50% mark, right?
And the reason Oracle is so complicated,
the reason a lot of
these databases become complicated
or add these complicated features
is because they're really selling,
they don't really care.
I don't say they don't care, but
they're only really focused
on their top 50 customers and everyone
else is just along for
the ride.
And therefore you get these very
complicated systems with a lot
of weird options, which were added
only because 1 of the 50 wanted
it.
You have this very high-end group
who's very demanding, who's
calling the shots, and they're
dragging the Database into this
high-end volume.
But every time they're dragging
it up there, they're potentially
making the generic workload either
slower or harder or less efficient.
And they effectively don't care
because the money is in that
high-end.
Nikolay: Let me comment, please.
I cannot keep silence here because
we help startups.
Our primary customer is startup
which grew some terabytes and
they saw something and they hit
problems.
And last couple of years we experienced
a lot of AI startups,
and this is crazy, absolutely.
They reach heights much more quicker
with much less effort and
resources.
And they start hitting problems,
for example, very quickly when
freezing is a big problem and we
need to do manual freezing to
skip indexes, for example.
This is very common right now for
us.
And second thing is that we're already losing some of them, because
when we start saying you need like partitioning is like you need
to allocate some resources for this like experiments and so on.
Yeah.
They switch to different database system or quickly consider
it where like less headache.
But this concerns me a lot.
Michael: Isn't that kind of Bruce's point?
That if they're an Oracle customer and they grew to being a significant
portion of revenue worth saving, that's different.
If it's the Postgres, like the Postgres project doesn't have
to listen to the huge customers because they're paying.
Maybe Oracle incentivized to listen to those customers more than
the average customer or more than all the little guys.
So Postgres is a different landscape where we can listen to people
more like across the board, not just the guys hitting extreme
scale.
Nikolay: My point is that the smaller teams have bigger databases
with increasing speed.
They hit problems which were hit only by big users before more
and more often.
These guys are more ready to switch faster because they are surrounded
by AI themselves, because they are AI startups.
Bruce: What do they switch to?
Nikolay: Sometimes proprietary databases, AWS or GCP, what they
offer.
Sometimes MongoDB also, like on the table sometimes, go into
different problems.
But I'm just concerned that I see this different thing that I
didn't see 20 previous years.
Bruce: So if we look at what's going on with the 64-bit IDs as
a microcosm, So they're willing to give us all the patches to
make this happen, right?
But what's happened is that we've only incrementally implemented
those.
We have some of them in PG 19.
For example, the MultiXact sizes are now 64-bit.
The members or the groups, I can't remember, I think it's the
groups.
Nikolay: And sorry, we had a couple of good examples last year
and some of them came to our podcast where this was exhausted,
this space.
Bruce: The multi-tac.
Nikolay: Yeah, yeah.
And also AI startups which have a lot of data very fast.
Bruce: So there's 1, 2 ways we can go at it.
We could just go whole hog, 64-bit everything, increase the header
size by 33% by making them 64-bit and be done with it.
But we're worried about the impact on an average user.
So what we've been doing is slowly 64-bit, enabling the server,
MultiXact is 1, and there's some other areas.
I think C-log we're looking at doing.
I don't know if we got to that.
So we're trying to do it.
We're trying to 64-bit the areas that are costless to us, basically,
where we can increase it without really any problems.
And then I think once we get to that, now we have 64-bit pretty
much everywhere that we can easily do it.
Then we have to look at, okay, now at the places they're going
to cost us, what do we do?
I think one of the great ideas that I saw was to get basically
an epoch LSN on the page header so that you could basically say
that for mass loads of data, you're doing a mass load of data
which a lot of these companies do, effectively all the tuples
are in the same transaction.
They're clearly all in the same epoch, right?
So you just put one epoch and then all your heap tuples look just
the same as they did before.
Right.
I think that's where we're going is the concept of having an
epoch on the page and then basically allow those not to necessarily
have to be frozen at all.
I think if we start to bring the epoch down to individual rows
and we increase the header size, which people have already complained
is too big, then we potentially could get complaints.
I think that, if I had to take a guess, I think that's where
we're going, but we're going at it very incrementally, again,
hitting all the places that store LSNs outside of the heap and
index pages, because we know those are going to be complicated,
and get the infrastructure done.
I know that we have some patches in 19 for that.
I've seen them coming through.
They got committed.
Then we have to decide, Okay, now that all this stuff is in place,
let's put up a proof of concept.
Let's see what a per-page EPOCH looks like.
Let's do some loading of data.
Let's see how far this gets us down the road.
We don't have a whole lot of empty space on the pages, so it'll
be a little tricky to figure out where to put it, but I think
we can find a place to put it.
But I think that's where we basically are is trying to figure
out once we get all the bottom stuff done, what do we do for
the main stuff, the heap and the index pages.
Michael: Nice.
I think that makes a ton of sense.
And I really like the incremental approach that Postgres takes.
I've only been in and around the project for the past 10 years
or so.
Bruce: It's a lot!
Michael: I know, it's a lot compared to some, but it pales in
comparison to 30 years.
And it's been really nice seeing the incremental improvements
but also that they stack up.
It really has come a long way in 10 years and things that looked
incremental kind of 9 years ago, 8 years ago, 7 years ago, they've
really added up and you mentioned vacuum but it's it's not just
vacuum right All of the changes that have been that have helped
with index bloat.
They kind of attacking from the other direction.
So it's
Bruce: that's very true.
Yeah, replication.
Yeah, yeah, partitioning, yeah.
Michael: Yeah, I like that approach and I feel like it might
sound like it's going to be slow, but time will fly in a few
years.
I can imagine us having made significant progress on some of
these things.
So yeah, that's really cool.
Bruce, I'm conscious of time.
Is there anything we didn't talk about that you wanted to make
sure we did mention or any last shout outs or pointers you wanted
to give people?
Bruce: No, I was just nice that we got a chance to at least talk
about, to me, the categories of what's missing was a big takeaway
for me to understand, like, why are we here?
Why don't we have some of the stuff that we're missing and some
of the stuff we may never have?
Optimizer hints.
Although that seems like a portion of that is coming in Postgres
19.
It's not called Optimizer Hints, it's called advise, but
it could be used in a similar way.
So again, I think that we don't have a roadmap.
That's the problem.
We don't, because we're too dynamic to have a roadmap.
So it's almost a surprise by me to see what gets in every release.
And, I think everybody's pleasantly surprised by what's in 19
and obviously what we're going to be starting for Postgres 20
in July.
Michael: Yeah, absolutely.
And I think I saw the work from Robert.
I think it was in contrib modules, right?
At least the first version.
Bruce: pg_plan_advice, yeah.
Michael: Yes.
Looks very cool.
All right.
Well, thank you so much for joining us.
I'm sorry Nik had to drop off
It's absolute pleasure having
you
Bruce: Great.
Thanks for nice talking to you.
Michael: Likewise.