Postgres FM

Nikolay and Michael discuss why counting can be slow in Postgres, and what the options are for counting things quickly at scale.
 
Here are some links to things they mentioned:

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is produced by:

With special thanks to:
  • Jessie Draws for the elephant artwork 

Creators & Guests

Host
Michael Christofides
Founder of pgMustard
Host
Nikolay Samokhvalov
Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

Nikolay: Hello, hello, this is PostgresFM, podcast about Postgres,

as you can guess.

My name is Nikolay from Postgres.AI, and as usual, my co-host,

not guest, co-host is Michael from pgMustard.

Hi, Michael.

Michael: Hello, Nikolay.

Nikolay: So you chose a topic, and it is?

Michael: Slow counts, Or like why is count slow and what can

we do about it?

Nikolay: How to make counts slow?

You just count.

So yeah, let's discuss.

I don't know.

For me it's super old topic and I'm not sure I will use some

fresh knowledge here.

I have a lot of fresh knowledge about Postgres, and I share it

constantly.

But this is something which is super old.

There are improvements we can discuss, like index-only scan.

We talked about them recently a lot.

But I guess my role will be to use my knowledge from, you know,

like 15 years ago mostly.

And just before we started this recording, I shared with you

how I a little bit helped PostgREST to get rid of count, right?

Because sometimes you don't need it.

So I'm going to stay in that area 10 years ago.

And If we have something new, I will be just putting more to-do

items to my to-do list to explore and that's it.

Maybe you know some things better than me.

Let's see.

Michael: Yeah, well, the PostgREST link is great because The resource

I still like to point people to on this topic, the best thing

I've ever found on it, is a blog post by the PostgREST author,

Joe Nelson, on the Citus blog.

So I'll link that up for people that I think explains it in a

lot of depth, but in 2016.

So quite a long time ago, in 8 years at the point of recording.

And I was kind of surprised how many things have changed, but

also how little has changed in a way.

Like there are optimizations.

A lot of things that were kind of little gotchas back then aren't

true anymore.

There are a bunch of ways that it's faster now by default.

But the essence of it is still true, like because of the way

Postgres does MVCC, when we do an application.

Because it's Rowstore.

Yeah.

Basically, because all the things that make it great as like

a transactional database count against it.

And a pun not intended.

When it comes to aggregating across
a lot of data and count is,

I think, at least in my experience,
that's the most common aggregation

people want to do in like typical
use cases.

Nikolay: Yeah, that's a good point.

Since the Postgres row store, it's
better.

Imagine you have a table of users,
for example, as many SaaS

systems or e-commerce systems do,
like customers, users.

Since Postgres is mostly like OLTP,
row store, OLTP-focused database

system, it stores a tuple, a row.

If it's not too wide, it stores
it in 1 place, in 1 page, basically.

As we discussed many times, a page
is 8 kilobytes by default

in most cases, and very roughly,
if a row doesn't exceed 2 kilobytes,

it will be on the same page, and
that's it.

The mechanism called TOAST won't
be involved.

If it exceeds it, it will be chunked
to pieces and stored in an

additional table called the TOAST table.

So imagine we have a users table.

And then users can log in, so we
need to retrieve information

about this user.

And it's good because all information
about a single user is

stored on the same page, right?

If it's narrow, right?

As I said, it means that it's quick
to retrieve.

It's just 1 buffer hit or read.

Michael: Yes, extremely efficient
for like if we wanted to show

the profile of that user if we
wanted their name, their age,

you know loads of information about
them, single page read to

return that information maybe like
a couple once you look them

up in the index, then the heap
page.

Nikolay: So it's very I/O optimized
basically.

Efficient.

Efficient, exactly.

But if it comes to counting or
calculating some sum or average,

minimum, maximum, everything, aggregates.

It's not good because usually aggregates
deal with only a single

column or a few columns.

And we don't need all the other
columns this table has.

If we have a row store, as Postgres
is, in this case, it means that

we need to do much more I/O, many
more buffer operations, hits

and reads to get information for
like to calculate aggregates

for a single column.

For example, count them, right?

And it means if you want, for example,
very, very, very common

ask in startups, in social media,
in SaaS, in e-commerce, let's

just show how many new users we
have every day.

We need to count daily, right?

And this means, like, imagine if
we have only a few records in

an 8-kilobyte page, few users records,
few users tuples because

we have many columns.

It means 1 buffer operation will
allow us to count only like

4 users, 6 users, and that's it.

Compared to column store where
each column is stored in separate

files, and if this file is about
ID, it's only ID.

If it's about creation time, it's
only creation time, datetime.

In this case, counting is much
more efficient because 1 page

can bring us many more users, and
we can count them much better.

This is quite straightforward and
very simple, a trivial comparison

of ColumnStore and RowStore and
why in ColumnStore aggregates

and counts specifically are much
faster than in RowStore, just

because of IO, right?

Michael: Yes, and that before you
start to think like, I think

because column stores are used
for these aggregation and analytics

use cases so much, I think they
also have, they tend to have

more optimizations around these
aggregations as well than at

least like the "old school" transactional
processing databases

have.

I think lines started getting blurred
a little bit with some

of the extensions, like some of
the newer extensions in Postgres

world that also have some of these
optimizations.

But yeah, I think that's a great
starting point as some of Postgres'

strengths working against it.

I think another 1 that is not just
RowStore versus ColumnStore

is because of how Postgres does
visibility.

I think there's if you look at
some other transactional process

databases, because they don't do
like the same marking pages

as not because they do undo logs,
basically.

They can also cheat a little bit
for certain counts.

Like, if you want to count all
of the users, which if you have

some specific use cases, like let's
count all of the records

in this table, they can keep, they
cheat and do some of the solutions

that we're going to

Nikolay: discuss maybe at the day
to day level.

Well, they're not cheating.

They put new row version in the
same place as old row version,

it means that compactness, if we
have already compactness of

storage, it's not lost.

Compared to Postgres, where new
version comes to a new place, maybe

a different page, in many cases it's
a different page.

Maybe it's a place which was taken
by another tuple which became

dead and then vacuumed, deleted
by a vacuum, right?

Or autovacuum.

It means that we might end up having
very scattered storage for

our tuples, especially if we need
to count only specific tuples.

We have the WHERE clause in our
SELECT COUNT.

It means that we need to, or we
might have an index scan for this,

but it's kind of random access,
right?

We need to count many tuples which
are stored in random places,

and we don't use partitioning.

Our table is 1 terabyte in size.

It's super bad case.

An UPDATE randomizes the storage,
right?

The new version is stored in random
locations.

And it becomes worse and worse
and worse.

It's very scattered, and we lose
compactness.

This is how MVCC in Postgres works,
unfortunately.

We oftentimes say it's more like
rollback-focused than commit-focused.

Because in case of rollback, the new
version becomes basically dead

eventually and cleaned up, and the
old version remains in the same

place, and we don't lose compactness.

If we have a commit, we move rows
all the time.

We move tuples, the physical versions
of rows, all the time, and we

lose compactness eventually.

And this is bad for count, bad
for other aggregates as well.

Michael: Yeah, so I think that's
like a good overall summary

of why count is slow, or at least
why

Nikolay: it scales.

Michael: I think while it's not
bad on tiny datasets, it's fine,

right?

You probably won't notice it too
much, but as data scales, even

in the blog post I was reading,
they only did a dataset of a

million rows and already, even like
the fastest optimizations

at the time could only get an accurate
count in about 100, 200

milliseconds, something like that.

So we're already talking about
10x what you would consider a

fast query.

So even at a million records, which
is not many in many use cases,

we're talking about a potentially
meaningful length of time.

So yeah, I think, should we move
on to what we can do about it?

Nikolay: Let me share the history
of this blog post.

I might somehow provoke that blog
post that John Nelson wrote a little

bit, and I'm not taking full responsibility
for that blog post,

definitely.

But I remember he published it
in 2016, right?

And I started using it all the
time.

I shared, like, if you want to
understand count, just read that

blog post.

Of course, right now I would say
it's missing BUFFERS in the

EXPLAIN plans. For sure, because
as we just discussed, I always

reason number 1 of why things are
slow, right?

We should not talk about timing
only. We should talk about BUFFERS

as well all the time.

But back 1 year earlier, in 2015,
I started looking at PostgREST,

which right now is 1 of the main
building blocks

Supabase has.

And I was impressed, like, great,
we can ditch Django and Rails

applications completely in simple,
at least in simple cases,

and do everything on server-side
and front-end.

This is actually the same time
React became popular.

I think it was started in 2013,
'14, and so on.

So it skyrocketed in 2015.

And a lot of business logic went
to frontend, of course.

Also React Native.

Great.

So we need to get rid of middleware,
keep data-oriented business

logic in the database, keep UI-oriented
business logic on the

client side, and this is great.

So I found PostgREST, and I quickly
realized I came from already

quite heavily loaded systems.

I remember, for example, first
time I saw integer for primary

key exhausted, it was in my own
project in 2008.

So I already realized some data
volumes and how they work in

Postgres, And I quickly found that
in PostgREST, they had already

some pagination, very good, but
on each GET request, you say,

I want only 25 items on my page,
but they also show you how many

overall items you have for this
request with all your filters

and so on.

And I thought, well, that's not
good.

I know Postgres has slow count,
right?

If I every time they counted, so
it's written in Haskell, so

I started digging inside source
code and I found indeed it runs

count all the time, so I helped
them to get rid of it by default.

So I remember I created issue,
let's get rid of count by default,

and they quickly agreed and removed
it.

So I'm happy I helped a little
bit to realize that count should

not be used by default.

And also at the same time, Joe
Nelson, the original creator

of PostgREST, started to dig into
this topic and found several

alternative ways how to make count
fast, including approximate

method.

I like this method.

It doesn't work all the time.

It works very well if you have
autovacuum well-tuned, so it

runs auto-analyze frequently, right,
because you need fresh statistics.

And it also works if you don't
play too much with filtering.

So, for example, if you have a
users table and you know to display

overall count of rows in this table,
you can use just approximate

approach, which, in a few words,
you just run explain without

analyze, you just run explain,
select star from users, right?

And the planner will tell you how
many rows it expects.

And this is already an approximate
number you can use.

This is all trick I first learned
maybe in 2004 or 5 when I started

with Postgres.

We used it.

Michael: I think, yeah, I think
this trick comes into its own

for things that are harder to find
out from the statistics or

from other methods.

I think, like for example, if you
have dynamic filters that users

can set, or if you, basically,
if you have,

Nikolay: if you

Michael: need to be able to set
arbitrary where clauses or you

want to be able to join relations
and get estimates then, This

is an extremely powerful estimation
tool because you're basically

using Postgres's planner to do
it.

Nikolay: But you need to make sure
it's up to date most of the

time.

Michael: Well, that's not even
enough though because we've had

plenty of episodes where we've
talked about times where the plan's

estimations can be way off.

Nikolay: So yeah, just have 2 columns
and 2 filters and don't

have multi-column statistics, that's
it.

Exactly.

Michael: It's a super easy way.

Yeah, or in 2 different tables,
like, where you can't even.

So yeah, so we've talked about
the limitations of that before,

but in the real world, I've seen
very few, I mean, it's probably

a limitation of my experience rather
than because I've seen how

popular these estimation tools
are.

I've seen very few cases where
people would rather have an estimated

count than no count at all.

In lots of user facing applications,
I've seen people want an

exact count, or they don't want
to show it.

And there seems to be this...

I cannot agree.

Nikolay: I cannot agree.

For example, I...

What do you see?

Michael: Where do you see

Nikolay: it used?

Well, for example, I had this.

We have a social network, for example,
and we just want to display

the total number of registered
users and the total number of,

I don't know, like groups created
or anything created, posts,

comments on the front page.

And it's just overall number.

Definitely this approach will work
there very well.

And we rounded it.

We don't show, well, next idea
let's round it but also show some,

if we know current number and 1
week ago number so we can understand

the velocity, the speed of growth
and we can also pretend it's

growing right now.

It's like fake, but this is what
social media do all the time,

right?

Michael: But this is kind of my
point that you wouldn't use the

actual...

So the number when it's an estimate
could go down.

Like even if it's like a number
of you, like in that case, you

wouldn't want to show a number
this week that was lower than

the number last week.

Nikolay: Right.

You're right.

Michael: So there are you wouldn't
you still wouldn't use the

exact like.

So to be more specific, there's
some really cool algorithms.

There's a really popular extension
called HyperLogLog, extension

called Postgres HyperLogLog.

Yeah.

Nikolay: Count distinct.

Michael: Yes, so it's for estimating count
distinct, you're right.

So more specific than just count,
but it's available on pretty

much every, every cloud provider,
which makes me think it must

be pretty popular.

Like it must have been requested
on a bunch of these.

Nikolay: Not necessarily.

Things work in life differently.

It might be just, you know, like,
we don't know how popular it

is overall.

It might be just cool to have it
at some point.

You know, sometimes people have
high expectations.

And also in case of managed Postgres
services, in any areas of

business, any segments of market,
if some big player added it,

others also added it.

Michael: Or even if, let's say
only 1 big customer needs it or

wants it.

If they're big enough, maybe you
add it.

Nikolay: Yeah.

Also might happen.

Yeah.

Michael: So yeah, good point.

But I do think this whole category
of estimated counts is super

interesting.

And that feels like 1 big solution
that you can say, look, your

count is slow, 1 option is estimating.

And I think it's really interesting
intellectually, and I'd love

to hear from people that do use
it in production for real things.

But very often I see people say,
okay, that's cool, but what

are the other options?

What else can I do?

Nikolay: Let's finish with when
to apply this approximate approach.

I think if you need to count rows
in 1 table, it's easy, no filtering,

it's definitely an option.

But you need to make sure autovacuum
is configured to trigger,

like autovacuum analyze scale factor
is configured properly

so statistics are up to date most
of the time, I would say.

But also, if you have filters,
these filters should be like single

column, no joins.

In this case, it's good.

In some cases, it can be not good
as well.

For example, if you need a daily,
it might be off completely.

Or if your project already is old,
you have many years of data,

daily might be also challenging
because we know by default Postgres

keeps only statistics for 100 buckets.

Michael: Oh, interesting.

Nikolay: I

Michael: see where you're coming
from.

Nikolay: But we can raise it.

We can raise it for specific tables,
even specific columns.

And this can help as well in this
case.

If we know we do daily or monthly
counts, we can change this

setting and have more precise statistics,
right?

And still use an approximate approach.

Make sense?

Michael: Yeah.

I think it might even be simpler,
like it might be that most

people don't end up going this
route because they actually find

they can live with the trade-offs
of the other approaches.

Like it feels like a last resort
estimation.

If you've got a slow count, maybe
you're thinking, well, can

we make it like there are ways
of making it faster, but they

come with certain trade-offs.

So maybe we should discuss those
first and then come back at

the end to say, if you can't live
with those trade-offs, then

you have to then, and you still
want to display something or

show something.

Nikolay: For me it's vice versa
actually.

Approximate count is super easy
to implement.

I would consider it first.

Would it work well enough in our
case or we exclude it completely?

That's it.

Because it's super easy to implement.

But let's move on.

You already started talking about
HyperLogLog, right?

This is for distinct count and
this extension is indeed available

almost everywhere.

So if you have a count distinct,
why not consider it, right?

Because it might be very good,
fast, and so on.

And Joe Nelson and this old article
explains it.

So what I would do, I would consider
it in this case.

I would do some benchmarks with
current data volumes, but also

10x for example, predicting future
growth with focus on I/O.

I definitely would look at buffers
numbers in the plans and understand

how, like, what about scalability
here?

If we scale 2x, 5x, 10x, will still
I/O numbers be fine and leading

to acceptable latencies or no?

This is the question, right?

Michael: But this is, we're still
in estimation, right?

Nikolay: Yeah, but it's more like
it's smarter already, right?

Michael: Yeah, true.

Smarter than just relying on Postgres
statistics.

Right.

Yeah, true.

Nikolay: And next, like, I know
the article also just, like,

it's interesting that we discussed
the article which is 9 years

old or how many 9 right?

Well,

Michael: I actually I did as part
of my prep for this I did 1

they Conveniently they provided
like code snippets for each step

and I didn't I did a little a few
changes But it was very very

easy for me to run this on Postgres
17, actually, just because

I was doing some testing.

Nikolay: What changed?

Michael: Well, so the first thing
I noticed was his plans didn't

even have parallel.

Like, bear in mind, we're talking
about doing a count over a

million rows.

These days you just expect to see
the first query plan have a

gather and...

Nikolay: 3 workers by default,
basically.

Michael: Yeah, exactly.

And I didn't.

And I was like, wait, so his blog
post was pre-Postgres having

parallel workers.

And that was

Nikolay: the first thing.

And it means roughly it should
be by default 3 times faster,

because aggregation, like, append,
how it's called, it's easy

in this case, it's just sum of
numbers, That's it, right?

Michael: Yeah, count should parallelize
really well.

Because I don't know his hardware,
and I didn't go back and try

and get a really old version of
Postgres, I didn't actually.

Actually, I probably could have
just tested by turning off parallelization.

But yeah, probably about 3 times
faster.

Nikolay: Good.

This is a good point.

Yeah.

Michael: Yeah, that was the first
thing I noticed.

And then there were a couple of
other differences in the nitty

gritty, but nothing like fundamental,
nothing like too big.

So most of the points still stood
really well, even though, as

I said, it was like a lot's changed,
but also not that much has

changed.

So yeah, the other 1 was in the
past, you had to do a workaround

for you needed to use a subquery
trick for like count distinct

to get an index only scan.

So instead of being able to do
count distinct on column and get

an index only scan, you had to
do like a select within that.

So you had to select distinct
to get the index only scan, don't

need to anymore.

That was quite cool to see.

Nikolay: Yeah, that's a good point
as well.

Michael: But yeah, like a bunch

So actually index-only scans, we
should probably talk about that.

Nikolay: Yeah, we just had the
episode about it a month ago.

So it's very related to...

The ideal case for Postgres in
its raw store situation, it's

index-only scan with low heap fetches,
ideally 0.

Which means, again, that leads
to the need to configure autovacuum

to make it work more frequently
and move on faster.

In this case, if it's an index-only
scan, this is the best we

can do with Postgres.

Parallel index-only scan, maybe,
as you said.

And this is good.

I mean, this is the best thing
you can see in plans index-only

scan with low heap fetches.

The only better situation would
mean you need to start denormalization,

caching, and so on.

I would consider it as a heavy
solution to the problem.

Because this would give you a few
buffer operations to retrieve

your count.

If you have fully denormalized,
maybe, for example, materialized

view.

I would not recommend using materialized
view in complex projects

at all.

Default materialized view.

But consider we have it already,
and we have proper index on

it.

Finding proper count would be just
finding maybe 1 row, that's

it, 1 value.

It's like primary key lookup, right?

Or maybe just index lookup to find
1 record.

That's it.

It's super easy but to achieve
that, First of all, it requires

effort.

You need probably triggers or asynchronous
triggers somehow.

And second of all, if it's not
materialized view, you need to

have something probably with desire
to support partial refresh

because default materialized view
supports only full refresh,

which is not efficient in most
cases and can lead to bloat itself.

Michael: So I've seen people implement
versions of this and use

off the shelf things that are kind
of in Postgres extensions

for this as well.

So this is actually the that's
the approach I've seen most commonly

used for this because it doesn't
scale with volume.

I think it's really attractive
because even an index only scan,

if you double the data size, you're
going to roughly double the

time it takes to do the count.

So you're going to keep bumping
into this problem down the line.

Whereas...

Nikolay: Well, yeah, you're right.

If you double...

Yeah, even with the index-only
scan, we still need to...

If we, for example, if we forget
about structure completely,

talk about only leaf nodes, if
we have 2 times more values to

store, we need 2 times more pages,
right?

2 times more leaf nodes.

And it translates to 2 times more
I/O.

Operations.

And of course...

Michael: And even if it's not 2
times, it's still like, it still

degrades over time.

And these counter-caches...

Nikolay: LRU, not better than 2
times.

It can be even worse, but not better.

Because if you need to keep references
to 2 times more tuples,

you do need more space in leaf
nodes, right?

2 times more leaf nodes, at least,
maybe more actually, if we

have a rebalancing situation and
again, not compact storage of

tuples and so on.

Michael: Yeah.

So on the index-only scan, actually,
I think it's worth, before

we move on from that, we're thinking
about some, because I think

that goes back to the beginning
of the conversation.

You talked about rowstore versus
columnstore.

If we have an index only scan on
only a single, relatively small,

like let's say a primary key, for
example, that we can use to

count, it's ordered as well.

So it can help us with count and
count distinct.

It's almost columnstore-like in
a way, because if we only have

to search the index, we're searching
like a single column effectively.

So we're almost getting that columnstore
benefit without some

of the aggregation tricks that
some of these columnstores

Nikolay: have.

Leaf nodes are doubly linked in
both directions, so we avoid

traversing the whole tree.

We just start at the beginning of
our range, the first value, and then

we just go scan leaves only, and
that's great.

I mean, you're right.

If it's an index-only scan, we don't
care about other columns.

But interesting point that we need
to make sure we select count

star, we'll probably choose some
index, most likely primary key,

right?

But if you count some column which
allows NULLs, like we had

a very long time ago, we had an episode
about the dangers of NULLs.

And this is one of the dangers, because
count, if it's NULL, it

doesn't count.

Count doesn't count NULLs, right?

This thing might be surprising
in some cases, even for experienced

folks.

I surprise myself with this once
every couple of years at least.

Michael: I see this so often in
people talking from a computer

science perspective.

People asking, like, what's faster,
count(*) or count(1) or

count(ID)?

But I've never in practice seen
any use for anything other than

count(*).

Have you?

Nikolay: Well, in some cases I
specifically count not null values

in some column.

So I put the column there and that's
it.

In many cases I count with additional
filtering.

You know this, it's already not
new, it's like 20 years old,

but you know filter extension.

Like you say, count(*), this
is our overall count.

And then instead of doing these
very heavy structures in SQL,

like case when blah blah, case
when else, I don't like them.

So instead of that, you just say
count(*) or column filter

and then in parentheses where.

And this gives you opportunity
to count many things in 1 go,

right?

You just have

Michael: 1 scan.

Nikolay: Yeah, so it's filtering
on the fly.

If it suits your filter conditions,
then it will increment your

additional counter.

So you have multiple results in
1 select, multiple numbers returned,

right?

That's a great thing.

And in there, sometimes I use colon,
understanding that nulls

won't be counted in this approach.

For example, I want to understand
how many people fill this value,

again, user stable and we have,
for example, Facebook ID.

It's null if it's not, if Facebook
was not connected, in this

case, you say count(*) means
overall number of users registered

daily, for example, and then you
say count(Facebook ID), counting

only those people who connected
their Facebook accounts.

Makes sense, right?

Yeah.

In this case, you don't even need
filter.

So it's kind of just maybe syntax
sugar, right?

Michael: Yeah, I have seen an argument
that * is misleading

because it doesn't mean what it
means in other contexts.

And maybe we'd have been better
off if it was like count and

then parentheses but nothing in
the parentheses would have been

Nikolay: a getable.

Lev Tolstoy Danylovich I actually
agree with this.

Because if we propagate this logic
about nulls, the * should

mean none of the columns are null.

And this is always so.

Lev Tolstoy Danylovich Yeah,

Michael: but All of the columns
are not null.

Nikolay: So let's talk about this
denormalized approach.

I think this is most powerful but
requires effort.

Michael: And well, it depends.

I think there's like some really
simple approaches that have

some big trade-offs and then there's
some more complicated approaches

that have like fewer performance
trade-offs but they were more

complex to set up.

Like for example I think adding
a trigger that lets say you

know what you want to count and
you're able to maybe you're doing

a bunch of reads but not many writes
to this table and it's already

quite big.

So you're okay to pay a little
bit of overhead on each write

in order to maintain just literally
a counter.

Nikolay: Yeah, I've done it many
times.

But you know what will happen,
right?

Under load.

Okay.

Especially if you have foreign
key, as I remember, there are

issues because multixact ID and
contention.

And imagine you have count, like,
you know this popular anti-pattern

many people do in the beginning
of their career, when they try

to develop some accounting part
of their systems.

Michael: Like debit and credit.

Nikolay: Yeah, yeah.

And they have like common, how
to say, common account or something,

like counting all the money flow.

And this requires updates to single
row in some table.

Any transaction, financial transaction,
triggers this update

synchronously.

And this becomes quickly a hotspot,
obviously, because if you

have 2 SQL transactions, already
SQL transactions, doing some

financial transactions, that by
triggering, they try to update

a single row, they compete.

Like, I mean, this 1 will be blocked
until the other finishes

with commit or rollback, right?

And this doesn't scale at all.

Michael: Well, I've seen solutions
to that.

I've seen somebody, I think it
might have even been Tobias Petry,

It might have been one of his Twitter
tips.

All you do is you add more.

If you split-

Nikolay: Depends on the table,
right?

Michael: No, no.

So you have, say, 10, keep it small,
counts.

And each right triggers, updates
one of those, but you don't know

which one.

It gets random, it updates one of
them, and then when you want to count

everything,

You have to sum all 10.

Nikolay: Right.

Michael: So yeah, it removes the
hot, like it spreads the

Nikolay: The heat.

Yeah, this is one approach.

Second approach, if you, for example,
have some...

Well, it's not easy, I would say.

And batching is already a good
thing, but it's also like, okay,

you basically reduce contention,
but you don't eliminate it.

Full elimination would require
a synchronous trigger.

Michael: All I meant is it's quite
simple at low scale.

It only gets complex at higher
scales.

Nikolay: Yeah, I also did it.

Let's increment by 10 all the time
or by 100 in jumps, but it

becomes already not real-time,
it's okay.

But ideally, in heavily loaded
systems, it should be asynchronous.

So normal writes don't trigger
immediately.

I mean, they don't lead to increments
at all.

And they just register an event, something
happened, right?

Or they just are there, and we
will retrieve them by selects

and that's it.

Yeah, synchronously we will do
processing with batches as well.

Do you

Michael: know my favorite, the
favorite thing I've seen for this

is, well, other than maybe some
proprietary solutions is simple

roll-up tables.

So maintaining a count of older
stuff in bigger batches.

So you might do a yearly count, and
then a monthly count, then

a daily count, and then anything
since that day, you go through

all of the data.

But it means that over time, you
can roll things up into bigger

aggregates.

And I think that, to me, scales
really well.

Nikolay: Yes, but the main approach,
naive, simple approach,

let's do synchronous updates and
we will immediately have a hotspot

it's bad.

Then okay asynchronous approach
there are two like sub approaches.

One is pool like every minute we
pool look oh do we have some new

data If we have it in one batch,
we issue update every minute.

It's okay approach, actually.

And reduction of it, it's actually
regular, it utilizes you and

you just refresh it fully from
time to time.

It's again, it's super simple,
relatively.

And it lacks partial updates, this
is probably very annoying.

So multi-lives view is a very rough
approach, regular multi-lives

view.

If it's pg_ivm or how it's called,
it should be better.

I haven't tested it myself and
I needed to convince.

I had big plans to test Hydra,
this new thing with DuckDB, right?

pg_duckdb or something.

Yeah.

This pg_ivm.

I have so many things on my plate
to test for maybe already not

months but years.

But unfortunately, I must confess,
customers with managed Postgres,

with their own needs, occupy our
time lately most of the time.

Because we try to do things which
don't require extensions.

RDS, CloudSQL, and others don't
have.

I wanted to mention PGQ.

PGQ is present in CloudSQL, right?

It's interesting.

So CloudSQL users can use it and
benefit from it because it's

very old and very well battle-tested
solution from Skype.

So imagine if we have some things
we can register in PGQ, it

will take care of scalability because
it has partitioning inside

and so on.

And then we just, it's like push
events, push events, and then

consumers of those events, they
increment count, but also maybe

in batches and so on.

So like, so issue just 1 update
to increment by some number,

right?

And this improves things a lot
and makes the system more resilient

because lags will be lower.

It can be not PGQ, but something
like Kafka or I don't know,

Sidekiq or what you have, like
celery, if you're a Python guy,

right?

But with this asynchronous nature
is really good and nobody likes

actually delays like 1 hour because
you start with 1 minute,

you say, okay, it's acceptable,
but the system grows.

And then you say, okay, we need
to reduce frequency to like once

per 5 minutes now.

But it then grows again once per
10 minutes now.

But if you have asynchronous push,
like through an asynchronous

message queue or something like
an event processing system, In

this case, you can build a good
asynchronous kind of trigger,

right?

I wish Postgres had something internally,
and every managed service

provider supports it, so all our
customers already had something.

But right now, unfortunately, it
requires effort, and you need

to build with things you use, sometimes
outside Postgres, like

Sidekiq or Kafka or something.

Michael: Yeah, I think we've mentioned
pretty much everything

from, like, vanilla Postgres without
extensions at the moment.

But there are a bunch of other
interesting projects.

I agree.

I think the DuckDB stuff, especially
that cross-company initiative

seems super interesting, but there's
at least a couple of projects

at the moment pushing some of these
analytical queries to a column-oriented

file.

Nikolay: But we need to keep in
mind that this also brings additional

challenges.

Again, this means you need denormalization
because you have original

table as a source of truth and you
need to bring data to extra

table or something and maintain
this relationship all the time

so it's consistent and not lagging
too much, right?

Michael: Yeah, but I think the
optimizations they add might then

be worth it in a lot of cases.

Like the column store and then
the vectorization, like there

are so many benefits once you pay
that, like if you're willing

to pay the upfront cost.

Nikolay: You can also do more like,
I would say it's already

traditional architecture when you
have analytical system in addition

to Postgres.

It can be, I don't know, like it
can be even ClickHouse.

It can be Vertica if it's old or
these days Snowflake.

And we know PeerDB was recently
acquired by ClickHouse.

And it means, imagine we have our
OLTP storage with users table,

and then through logical replication,
it goes to ClickHouse,

for example, and there you can
do any accounts with good speed,

right?

Any analytical, you can move analytical
workloads, which are

maybe even user-facing to there,
right?

It's also worth considering, but
it moves us outside of Postgres,

and I don't like it personally.

Michael: To stay kind of within
Postgres, I think the work the

Citus team did was super interesting,
because I think they were

largely used for these large analytical
workloads where they

used the fact they had sharded
to multiple nodes to let you parallelize

some of these queries.

So it was, it gained from some
of that extra parallelization

And then the other company doing
some interesting stuff in the

space, I think is Timescale, who've
added some optimizations,

added some functions.

But I think...

Nikolay: Continuous aggregates
is a great thing.

Michael: Yes.

And continuous aggregates means
that they've done a lot of that

heavy lifting.

If you have access to Timescale,
if you're self-hosted or on

their cloud, then you don't have
to do all the complicated stuff.

Nikolay: And if you have not only
access to Timescale, but to

proper Timescale, because when
some company says we support Timescale,

but it's only, how is it called,
like community edition?

Michael: Apache.

Yeah, I get confused.

I think they actually community
edition is the one that you can't

use.

It's the Apache 2 one that you can
use on other clouds.

Nikolay: Yeah, some clouds support
Timescale, but only this reduced

option.

But full-fledged Timescale is a
great thing to have.

Definitely compression plus continuous
aggregates.

We have it in a couple of places,
it's so great.

And this is it, like, it's a flavor
of Postgres, which I do like,

but it's not available if you're
already deep inside RDS or so,

you need to consider migration
and so on, yeah.

Timescale Cloud or self-hosted.

Michael: Exactly.

So, yeah, and then the last thing
to bring us back to the topic

we talked about last week is they
have an implementation of loose

index scan for count distinct.

You can use your own implementation
of loose index scan

to speed up some

of these.

Nikolay: Like in

Wiki, right?

Yeah, exactly.

Nikolay: With recursive.

Michael: Yeah, so that's pretty
much everything I had.

Nikolay: Yeah.

So I think we covered the basics.

If you want to deal with billions of rows with
hundreds of thousands of transactions per second

It's not an easy topic actually.

Michael: Thanks so much Nikolay.
Catch you next week.