Postgres FM

Nik and Michael discuss Nik's new project PgQue, a descendent of Skype's PgQ, for running queue-like workloads in Postgres.
 
Here are some links to things they mentioned: 

~~~

What did you like or not like? What should we discuss next time? Let us know via a YouTube comment, on social media, or by commenting on our Google doc!

~~~

Postgres FM is produced by:

With credit to:
  • Jessie Draws for the elephant artwork

Creators and Guests

Host
Michael Christofides
Founder of pgMustard
Host
Nikolay Samokhvalov
Founder of Postgres AI

What is Postgres FM?

A weekly podcast about all things PostgreSQL

Michael: Hello and welcome to PostgresFM,
a weekly show about

all things PostgreSQL.

I am Michael, founder of pgMustard,
and I'm joined as usual

by Nik, founder of PostgresAI.

Hey Nik, how's it going?

Nikolay: Hi Michael, everything's
alright.

How is your business and life?

Michael: Yeah, good.

We're in spring in the UK and it's
just getting a bit warmer

now and yeah business is good ticking
along I've got some upcoming

news soon actually that I'll publish
in the newsletter.

Nikolay: That's great yeah looking
forward.

Michael: How about you?

Nikolay: Yeah obviously I'm a guest
today right?

Michael: Yeah What are we talking
about?

Nikolay: We talk about the queues
again.

Queues and Postgres.

My favorite, remember I told you
so many times I like to observe

how many of them are created and
how many of them have issues.

Actually, almost all of them have
issues.

Now I dig into the topic deeper,
and actually I had surprises.

My understanding in the past was
not fully correct, And I'm going

to confess today like where I was
not.

Like where there were gaps.

Michael: Yeah, and just to clarify,
when you say that they all

have issues, do you mean queue implementations
inside Postgres or

inside relational database or OLTP
database?

Nikolay: Yeah, so when we have
a new client, for example, at

PostgresAI, we, our most popular
type of client is a startup

without DBA who are on RDS or Cloud
SQL or Supabase, whatever.

And they have issues because they
have growth.

This is my favorite type of client.

And they bump into some problems.

We check, we have various tooling
for health checking and almost

always we recognize 1 of a few
patterns like log-like, append-only,

unpartitioned huge table, or a
table usually also unpartitioned,

which receives some events to process.

They got updates, deletes, And
if this startup is more mature,

we see some like pgmq, for example.

Or if it's smaller, usually nothing.

And everything is wrong there,
starting from heavyweight log

contention, right?

But also bloat and a lot of complaints.

Michael: Are we talking about like
a self-rolled queue inside

the database?

Nikolay: Yes, so like naive implementation
of queue in Postgres.

You can see it naturally just looking
at pg_stat_all_tables, noticing

patterns of workload, a lot of
updates and deletes, but also

a lot of tuples and autovacuum.

If it's untuned especially, it
cannot keep up.

But also if they have long running
transactions or other reasons

to block xmin horizon.

We need to discuss it slightly
deeper today.

They will have a lot of bloat accumulated
and all latencies suffer

and all this like piece of workload
and this table, usually just

1 table, it becomes a hotspot.

This is a huge reason for them
to complain about how Postgres

is bad.

And I don't fully disagree, like
only who didn't complain about

Postgres MVCC and vacuum, all being
headache all the time, everyone

did.

I usually said, all you need is
2 things.

Actually not only said, I came
to some implementations of queues

less naive and even helped them.

For example, long ago, there was
a project called Delayed Jobs

in Ruby.

And I added a couple of things
like index, which was easy, like

just missing index, but also I
said, let's use SKIP LOCKED for

updates, SKIP LOCKED.

So you just don't need this heavyweight
lock contention when

multiple sessions compete to update
or delete the same row.

It's quite straightforward.

And to some others I said always
like you just need 2 things,

SKIP LOCKED and partitioning.

And As I understand, this is where
everyone went.

So SKIP LOCKED created roughly 10,
11 years ago, actually 11, 2015.

I think it was 9.5, right?

Because it's 2015.

That was a great feature to get
rid of heavyweight lock contention.

But it's not enough.

It doesn't solve the bloat problem.

Actually, somehow I noticed in
this my recent work over the last

few weeks on Hacker News discussions
and some other places, I

noticed people think that SKIP LOCKED
will solve their bloat

issues, vacuum issues.

It's not so.

So, partitioning and SKIP LOCKED
is quite good enough, and this

is where everyone went.

I think pgmq, actually all modern
queue systems and Postgres, they

love SKIP LOCKED.

They are like all about SKIP LOCKED.

At some extent my AI bots started,
We did a lot of research,

did a lot of experimenting, benchmarks.

So at some point I noticed they
started to name all these guys

SKIP LOCKED architecture somehow.

Sometimes, I don't like it, I like
more update-delete architecture.

So update-delete-queue systems,
not SKIP LOCKED systems.

Because SKIP LOCKED, it's shifting
too much attention to itself.

But SKIP LOCKED is a simple thing,
just let's get rid of heavyweight

lock contention, that's it.

Other problems which are bigger
actually and harder to solve

are not eliminated.

They can be only mitigated with
partitioning and rotation.

And I saw, for example, pgmq, it's
quite popular.

I think this is a good legacy from
Tembo.

It's supported, I think it's supported
on Supabase, quite popular

there.

And they actually also went to
get rid of the need of create

extension.

So they re-implemented it fully
in PL/pgSQL.

And then in form like a trusted
language, you know, pg_tle, trusted

language extension.

Since it's purely PL/pgSQL, you don't
need to ask provider to support

it.

If provider supports pg_tle, You
can have it and not only load

it as only SQL file, but you can
have it as tracked Extension

without provider support because
it's just a pure PL/pgSQL.

So They also focus on SKIP LOCKED
and they have support of partitioning

but it requires pg_partman and additional
effort Right,

Michael: And just a question on
their partitioning.

Is it like time-based partitions
and you detach and drop them

over time?

Or is it a rotational thing

Nikolay: like you talked about?

I don't remember but it doesn't
matter actually.

So what matters here, you cannot
rotate partitions every minute.

It's not practical.

Sure, sure, sure.

And it will lead actually to some
other issues.

So partition rotation should happen
less often.

And by the way, those who use partitioning
to mitigate bloat,

it's a great idea, but instead
of detaching, attaching partitions,

which I see in some queue-systems,
better to have several static

partitions.

I mean, stop creating them.

It will lead to catalog bloat eventually,
right?

And also detaching has its own
issues under heavy loads.

It's better just to truncate and
have rotation.

Like round robin of partitions
and you just truncate them and

that's it.

So it's much better in many senses.

And this is what PgQ does.

So I always said like these 2 things
are enough but I realized

that they are not.

That's the like...

Michael: Can I just?

PgQ, when you say PgQ, are we talking
about your new tool, PgQue,

or PgQ the Skype based origin,
because just the letters PgQ?

Nikolay: Let me, yeah, Let me explain
how it all started.

So on our podcast I was saying
that's it, like just SKIP LOCKED

and partitions, rotation or something.

SKIP LOCKED felt natural because
this is how we solve heavyweight

lock contention when you have multiple
backends competing to

update or delete the same row.

Michael: Yes, yes.

Nikolay: Inserts cannot.

The inserts don't need it.

They are independent, right?

But updates and deletes, they can
compete.

And when I said partitioning, I
always said, look at PgQ Skype

created 20 years ago.

And that's it.

I thought it's enough.

I thought Skype didn't have it.

I mean Skype didn't have SKIP LOCKED
that time.

So I was thinking they did it differently
because SKIP LOCKED didn't

exist in 2006 and 7.

PgQ was created exactly 20 years
ago, 2006, and it was open sourced

in 2007.

Now I'm talking about SkyTools
PgQ.

Right.

Michael: Yes.

But, okay, but it didn't do skip
lock because it but it was also

not doing updates or deletes right
like it

Nikolay: was Exactly, we will come

Michael: to that.

Nikolay: But I was thinking I had
a false impression that we

should use keyplog because it's
a modern path, but we also should

mitigate bloat issues, vacuum problems,
and so on, just using

partitioning and rotation.

And coincidentally, this is how
all current guys are doing it.

River, pgmq, Que, others.

There are many now.

And also some of them are quite
agnostic to languages.

Some like River is Go oriented,
so they're focused on Go frameworks.

And that's it.

And then what happened actually?

So this is my understanding 3 weeks
ago.

Then what happened?

I was in Zion Canyon On campground
and they have good connection.

Actually.

I was in my tent And I saw that
it was Friday evening.

I think and I saw plane scale blog
post Yeah Right, and I think

I was I started to read not a blog
post itself because I quickly

realized what it's about.

I started reading discussions of
it on Twitter on X.

So that post was dancing around
Brandur Leach, who is actually

1 of creators of River Queue.

So Brandur had a post in 2015 about
how, like, basically how

challenging it is to have queue in
Postgres because of MVCC and if

you have long-running transaction
with XID assigned or repeatable...

I think Brandur used repeatable
read transaction lasting 1 hour

or half an hour, I don't remember.

And I think in his post it was
like below 1, 000 events per second

inserted, like maybe 800.

And quickly, like something like
60, 000 events were accumulated

unprocessed by consumers because
everything started to lag and

so on.

It was 2015.

I think this is actually was a
year when SKIP LOCKED was added

to Postgres.

Interesting right?

2015.

Yeah, good timing.

Yeah.

So PlanetScale discussed how bad
it is.

Not like queue and Postgres are bad,
but it's bad to have long-running

transactions or something which
is blocking xmin horizon, right?

And they promoted their new feature,
how to mitigate it, but

mitigate how?

Just cancel that, right?

So they have some smart approach,
like which traffic is more

important, which is less important.

In my opinion, what we created
with Andrei, Transaction timeout

is good enough for everyone as
default solution against long-running

transactions Although there might
be other problems like unused

or lagging logical replication
slot, right?

Yeah, it can be or Maybe someone
is using 2PC and prepared transactions

also can be a problem.

Michael: So anything that holds
xmin horizon and prevents the

cleanup of dead or like old versions.

Nikolay: By the way, we just released
our monitoring which is

like front.

When we say front, we mean monitoring
with Grafana and Victoria

Metrics and Postgres inside everything.

We just released with our new dashboard
for Xmin Horizon analysis.

And there are 5 possible reasons.

And also Laurenz Albe, very timely
posted, blog post about

how he monitors autovacuum.

I stole a couple of thoughts there,
I just implemented it in

dashboard and it's already released
and like it's free for use

Apache license, but it works much
better if you go and become

our customer, because we have great
new health metrics.

I will blog post about it separately.

Anyway, this is connected because
xmin horizon, like I, we also

talked about it.

Everyone monitors long running
transactions, but it's off.

It's wrong thing to monitor in
this context.

You need to understand xmin horizon
being blocked, and by whom,

to unblock it promptly, because
this is how you can put your

River or Prisma Queue or something
down, actually.

Not down, but basically lagging
and having very poor performance.

Michael: Accumulating a lot of
bloat that is not ever recovered,

if it's not using like a...

I think that's the other thing
that people don't realize is there's

no recovering from that because
once that's bloated unless you're

using like the partition rotation.

Nikolay: There are several things
here several things.

First it's a lot of dead tuples
are accumulated.

Yeah.

Because xmin horizon is blocked
so dead tuples are created every

time you produce delete, successful,
delete, successful, update,

or unsuccessful insert.

So it means, by the way, that we
also can produce dead tuples

if some inserts are failing.

But this is very subtle.

It's nuance, like we can omit it,
right?

So regular approach, we always
produce dead tuples.

Tuple is a row version.

We agreed on our first episode
that I say tuple, you say tuple,

or vice versa.

I don't remember.

Yeah.

Anyway, tuple is a row version.

So old version becomes dead, but
it's still hanging out everywhere

actually, including shared buffers
everywhere.

It's polluting everything.

So garbage collection called vacuum
is needed to delete it.

So it's a two-phase process.

First it's only marked that and
then it's deleted.

And the first problem, a lot of
dead tuples are accumulated,

they cannot be deleted by garbage
collection called autovacuum.

And this first bad effect is latencies
of consumption degrade

very fast, actually.

Because to find the next thing,
you need to skim through all

the tuples with your index scan
and it becomes less and less

performant.

Next bad thing is that we accumulate
a big set of unprocessed

events.

Sometimes, not always.

Sometimes we have degradation.

And when I say degradation, it
means like degradation was, it

was like 1 millisecond to fetch
next event, for example, or a

bunch of events.

And then it degrades to second
or a few seconds.

During 1 hour, I saw, I think,
5 seconds for some queues.

Just to fetch 1 event, 5 seconds,
can you imagine?

It already, at some point in some
systems, if you keep inserting

a lot of events and consuming them,
at some point it might start

timing out on statement timeout.

If you have strict statement timeout,
as you should for all OLTP

systems.

We always recommend to have strict
timeouts.

So a lot of that apples degradation
of consumer performance,

consumer query.

Second effect is a lot of them
accumulated just because consumer

capacity throughput is not enough.

You have, for example, 10 consumers
working in parallel, skip

lock to help them not to fight
with each other.

And you have capacity, for example,
to consume 2000 events per

second.

But now suddenly latency became
from 1 millisecond to 1 second

thousand times worse.

It can happen during 10 minutes
or so.

And 10 minutes just like for example
you have multi-terabyte

database.

If you decided to dump it, right,
Or create logical replica in

a traditional way.

This is it.

This is how you can reach second
level of consumer query performance.

And this leads to accumulation
of unprocessed events.

And for some queues I noticed even
when we stop a long-running

transaction, we unblock xmin horizon.

First of all, autovacuum comes,
cleans up dead tuples, immediately

latency for consumer query drops.

But for some it didn't recover
to previous level.

It didn't recover to 1 millisecond
level.

It stayed like 50 milliseconds
or so.

I think it was QUE, which is like
Q-U-E.

Very bad naming.

It's mine.

Michael: I call it care.

I call it care

Nikolay: in Spanish.

If you check, I'm not speaking
Spanish, but who is speaking Spanish,

how my tool is pronounced, it's
crazy, right?

Something like PepeK or something,
I don't know.

I have an issue, and actually I
have already pulled a request

to rename, but I didn't like all
brainstormed names so far.

pg_belt, this was the best I had,
I didn't like it.

And we will come to that, why
pg_belt, right?

And why queue actually is misleading,
as I learned.

I learned 2 big things during this
journey, last 3 weeks or 2

weeks.

Michael: So just quickly, before
we move on from side effects,

I want, so eventually even in a
non-partition system, the heap

bloat would get re-once the tuples
are marked dead and vacuum

comes along, marks it's like reusable,
new jobs or new events

can go into that space in the table.

But people don't, and I know we've
talked about this like thousands

of times, but it's the indexes
that can end up bloated in a way

that isn't recoverable without
a reindex.

So I wonder if that could be the
source of the not ever recovering.

Nikolay: Maybe, yes.

Yes, you're right.

And bloat might still be there,
and this is what can be solved

post-factum, actually.

It can be solved with partitions
and truncate and in rotation

because truncate means like you
will have fresh empty index and

it will start growing from scratch
it's great but again I was

thinking first actually it's an
interesting nuance as well I

was thinking oh partitioning won't
help in the middle of long

running transactions, in the middle
of when the xmin horizon

will...

Actually it will.

It will help.

It will help.

You will switch to a new partition,
old partition will keep old

stuff with dead tuples, degraded
indexes and that's it.

And this is actually, it will lead
us to interesting optimization

in my tool.

But what it won't be possible to
do, to have, I think it's possible,

but I wouldn't recommend it, to
have aggressive, like every minute

partition rotation and truncation.

First of all, you need a lot of
partitions, maybe.

Maybe not, actually.

Maybe you can find a way to just
jump between them.

But it's not practical, again,
because of catalog bloat and stress.

When you switch to new partition
and some stress and you need

to clean up and all everything
should be processed.

And if you dropped, if your dropped
latency, like degraded latency

leads to event, unprocessed event
accumulation, you won't be

able to recycle, right?

Old partition because it still
has useful data.

You can move it to new, like it's
becoming nightmare actually.

So anyway, it feels architectural
wrong for me to have very often

partitioning switch.

Right, this is...

And back to your question, time-based
or size-based.

Well, time-based is fine.

I actually don't remember what
I have in settings.

I need to check.

But here your other question, are
we talking about PgQue new or

PgQ old SkyTools?

So what I did I took it as is,
right?

First thing I did, I took it as
is and I just started to build

around it.

So I'm not touching core engine,
That's the key.

Michael: So the original SkyTools
PgQ core engine remains.

Nikolay: Great.

I

Michael: mean I knew that but not
actually by reading your...

I saw...

Did you see a really...

I thought it was a good blog post
by Christophe Pettus covering

the 0.1 announcement.

Yeah, he's been on a bit of a blogging
spree lately, but he's

done a blog post about PgQ, or
PgQue, your tool, already.

Nikolay: I missed it.

It's cool.

Nice.

Well, what I wanted, I wanted to
say, guys, like there is alternative

forgotten Kung Fu, I say, because
I know how trustworthy it is.

20 years.

I use it in 2 of my 3 social network
startups myself before we,

I think it's a mistake, switched
to RabbitMQ, I regret it now.

We use it heavily.

We use it even originally as Skype.

Skype build it also not only like
for event processing, but also

to have logical replication.

Instead of Slony, they built Londiste
and it was working on

top of PgQ.

A native logical replication didn't
exist at the time.

So it was serving many purposes.

And When they built it, Kafka didn't
exist.

Kafka created in 2011, I looked
at.

So it's called queue, but by nature,
this is second big thing I learned.

It's not a queue system.

It's a immutable log similar to
Kafka, not distributed like Kafka,

but or Redpanda, also modern thing,
right?

As I learned.

And it just guarantees that something
is inserted, there is order,

and consumer is just a pointer
shifting.

But it can be used for queue-like workloads,
maybe not all of them,

and we can discuss it in a bit.

But what I wanted, I saw this blog
post from PlanetScale, and again,

these discussions, like just use
SKIP LOCKED.

And first thing I posted, and I
found big feedback, like people,

like, yeah, that's it.

I said, first of all, SKIP LOCKED
doesn't solve vacuum problems.

It's just very wrong.

SKIP LOCKED solves heavyweight lock
contention.

Right.

And also SKIP LOCKED, there is another
post from Laurenz, SELECT

for update considered harmful,
right?

So there are issues with SKIP LOCKED,
SKIP LOCKED is a part of select

update.

You cannot use it without SELECT
FOR UPDATE.

And there are issues with that
approach as well.

Additional danger is there.

But I wanted to show like there
is PgQ from SkyTools and we

all know another tool called
pgBouncer.

And I'm wondering, and I know PgQ
is still used in very large

companies as a important building
block, very reliable, very

performant.

Skype originally built architecture
for, as I learned from their

talks, for 1 billion users.

They achieved hundreds of millions.

I don't know after acquisition
by Microsoft, by the way, Skype

was closed last year.

Also case.

So that's an interesting legacy
here.

And I'm thinking, why PgBouncer
became quite popular, but PgQ

didn't became quite popular?

And my hypothesis is it's because
of its extension, which requires

additional daemon.

And maybe I'm wrong, but maybe
providers just didn't want to

bring any additional daemon which
is not a regular background

worker, but something that needs
to be managed separately.

And since you like...

Michael: Whereas PgBouncer is
completely independent and...

Nikolay: Well yeah, yeah, yeah,
yeah.

But yeah, that's a good point.

It needs to be managed.

There are some providers who provide
it as managed service, but

pgBouncer is like old tool solving
that problem.

Here it's inside Postgres and we
need additional daemon.

I don't know, like I didn't participate
in any of those decisions,

but I just don't see any of providers
supporting PgQ.

Michael: Maybe
on CloudSQL actually, is it?

Nikolay: They support PL... This
is maybe I told you this and I

was wrong.

I mixed it with another tool from
SkyTools called PL proxy.

CloudSQL supports PL proxy.

Yeah.

And yeah, hello, Hannu Krosing,
who actually liked every single

my post I think on LinkedIn about
PgQue because this is... He was

from Skype as well.

So it's great.

It's great.

And yeah, so Then interesting thing
like I'm thinking okay, you

know, you know my work pg_ash anti-extension
concept right?

Michael: Yes yeah we talked about
it.

Nikolay: Yeah then there are others
like I have a couple of more

coming but then I think okay I
just need to repackage it right

and being in my tent I have new
tool to create thorough specification,

so spec creation tool, which I
use for many things now.

It's CLI.

You start from idea and then explore
questions, research, and

then build a comprehensive tool
and then iterate with multiple

LLMs, most powerful you can reach.

And then you have already like
version 7, for example, which

is ready for implementation.

So I wrote spec, how to repackage
PgQ from Skype in this anti-extension

manner.

So no create extension is needed.

And since like we discussed this,
since we have pg_cron, Like

with pg_ash, same thing.

Michael: Almost everywhere.

Nikolay: Yeah.

Yeah.

You need a ticker.

So that's why that daemon was needed.

You need ticker.

So PgQ, it's a log.

There is an insert, can be single
insert or batch, never updates,

never deletes.

And there is basically their own
horizon.

It's based on snapshots of data,
right?

So every consumer knows position.

And to shift position, you need
to tick, you need to announce,

okay, we shifted because something
new arrived.

And by default, PgQ from SkyTools,
it ticks every second.

So it shifts and the consumer sees
new data and fetches a whole

batch of data, events.

So batch is by default, like the
batch processing is by default.

Michael: The thing I didn't understand
is that you have a second

table for keeping track of those.

Nikolay: There are meta tables,
yeah, for ticking and for subscriptions,

but queue itself, it's 3 partitions,
old school inheritance partition,

because it was created before native
partitioning was created.

So 3 partitions, 1 is in work,
1 is like in the past, 1 is in

future.

And there is rotation using truncate.

There is also a delayed table separately
for those events which

cannot be processed now Can they
need retry?

We put it there

Michael: Yeah, I don't end up truncating
jobs that haven't been

done.

Nikolay: Yeah, It was a single
table.

I think I also implemented the
same pattern there because sometimes

we have, we might have a lot of
jobs to be retried, events to

be retried for processing.

So I created also 3 partitions.

You also create a dead letter queue
for concept of maximum retries,

and then we put it to dead letter
not to retry forever.

So, I just suggested this is how
many modern systems work.

I said, oh, good idea.

Let's adopt it here on top of what
already exists.

So I created a spec.

So I was thinking, guys, I understand
the plain scale position.

Let's just have a shotgun and fire
all those long running transactions.

Great.

Transaction timeout also that's
shotgun.

Maybe less smart, but still working
and reliable.

But let's just compare performance.

I said, OK, let's compare performance.

And I said it in the same session
of Cloud Code, where we just

created the spec.

And then we started benchmarking,
provisioned, I think, 7 VMs

in cloud for alternatives and PgQ.

And I noticed that provisioned
2 machines for PgQ.

1 was original PgQ and 1 it called
like

PL mode.

I was thinking, okay, what is PL
mode?

We're supposed to create it according
to spec, but we haven't

implemented it yet.

It said, you created it in 2019.

I said, okay, first of all, it
was not me, it was Marko Kreen,

author of PgQ.

But second, how come it's created?

Apparently also interesting part
of the story, Alexander Kukushkin,

maintainer of Patroni, in January,
I think in PGD Prague, I might

be mistaken, presented a talk about
PgQ because PgQ is an excellent

thing.

Let's revive it, right?

Great, goals align here.

I also want to revive it.

And I was like, you know, I told
him, but unfortunately, you

cannot install it on RDS, right?

I was like slightly provocative.

He said, no, my slides have a recipe.

How come you learned it?

He said, it's Everything is written
in commit messages.

The problem with PgQ always has
been a lack of documentation.

Everyone can tell it.

I hear it for 15 years and can
say it myself.

So yes, indeed, there is a commit
in 2019 or so to make it work

on RDS and others, let's have this
support without create extension,

just a single file.

Or maybe multiple files, doesn't
matter.

PL only, PL/pgSQL only mode.

It's called PL mode internally
in PgQ.

So apparently it was already ready,
but it was not.

I said, okay, but I actually, meanwhile,
I talked to my cloud

code.

I say, okay, but I cannot find
that file.

How, Like, where did you get it?

I don't see it in the repository.

It says, okay, you just need to
run make.

Michael: What?

Nikolay: You need to run make to
get the SQL file you can load

to your RDS.

We live in different worlds here
a little bit.

I told Kukushkin, I think Gen Z
won't understand us with this

make.

I don't know, it's interesting.

For me it's like mind-blowing.

Everything exists, but it's buried
under these walls.

For some people, it's not a wall.

You git clone, CD, make.

psql, import file, everything works.

But imagine for some people it's
a big barrier, right?

Michael: And it would be weird
not seeing it in the repo.

Yeah.

Nikolay: Well, you need to read
commit messages.

And that's great that Alexander
has a talk promoting that this

is possible.

But it inspired me even more.

Let's do it and add more and more
in documentation.

So my tool is just basically, it
has even a sub-module, and then

it compiles and presents this as
our SQL.

But of course, I started adding
more things around.

So this is it.

This is the idea of PgQue,
which is universal edition,

so it can be used anywhere.

And I'm releasing this week second
version with a lot of stuff,

actually, a lot of stuff.

Somehow, some requests and so on.

First of all, I realized I need
libraries.

So we now have TypeScript, Go,
and Python libraries.

And 1 person promised to bring
Ruby library as well.

Michael: Oh, nice.

Nikolay: Yeah.

And I also already have 2 external
contributors.

So like I have some life.

I achieved thousand stars in 4
days or so.

It was good.

I mean, I, it felt great, but I
also learned it's not a queue.

It's like a log because it's more
like Kafka than RabbitMQ or

ActiveMQ.

And I agree after a thorough understanding.

That's why in version 2, I'm bringing
another concept from Sky

Tools originally.

It's called the cooperative consumers
or sub-consumers.

So logically it's a single consumer,
but there is a group of

consumers which distribute load
between them.

This is needed, for example, when
you have, imagine you have

a queue of jobs like process on
video.

Some videos are super small, Some
videos are super large.

In case of PgQs, if you just see
your position and read it by

1, if you have some people, by
the way, looking at PgQ think,

okay, if I'm going, if I'm adding
more consumers, I I'm increasing

capacity.

No throughput won't increase because
every consumer in PgQ reads

everything.

Everyone, everything.

It's, you need different queues to
distribute load.

It's like topics in Kafka.

Because it's actually not a queue,
it's a log.

Michael: Yeah, do you support multiple
queues?

Or I guess it just involves

Nikolay: creating multiple tables.

Yeah, you can create as many queues
as you want.

Every time they will be partitioned
and all the mechanics will

work.

But with concept of sub-consumers,
which is not my idea, it's

original idea, but I couldn't import
it because it's a separate

repo, PgQ Co-op, and it doesn't
have a license.

There are only 2 issues, asking
what license is it, because we

want to package it as Debian package
or something.

And I couldn't take it, but I stole
idea, of course, and re-implemented

it with my own code.

But the idea is the same, the feature
is right now experimental,

I need to play with it.

We started already benchmarking
it and so on, it looks good.

So this, I think, should be natively
supported.

So there is a lot of stuff, but
The key idea is that now it's

like a single file, you can load
it.

I also made it a pg_tle extension
for those who want to properly

track as extension.

And you just inject it, you configure
pg_cron.

I actually thought about, and Hannu
Krosing, who is now at GCP

and X Skype, we discussed it on
LinkedIn and I implemented it.

So pg_cron cannot tick more often
than 1 second.

Michael: I was going to ask you
about this.

Yeah.

But what reads me says the default
is 100 milliseconds.

Nikolay: This is new for version
2, yes.

I just made it yesterday.

So I was thinking, first of all,
people think about latencies,

I recognize 3 latencies.

First is producer query latency,
how fast it takes to insert

1 event or a batch of events, like
100 or 1, 000.

Second is the consumer query latency,
how fast it is to take

next batch, fetch it, right?

And we measure them and we see
how badly the second 1 degrades

for all alternative modern, I cannot
name myself modern tool,

I'm very old.

This engine is 20 years old.

So they all degrade.

This here, we don't degrade almost.

We slightly degrade from 100 microseconds,
so we go slightly

above 1 millisecond, while they
from 1 millisecond go to 1 second,

sometimes 5 I saw.

And degradation, we can discuss
separately.

Our degradation also solvable,
but not solved into version 2

yet.

When I say version 2, it's 0.2
because it's early, but it's super

solid engine, we know.

And there is third latency, end
to end event delivery latency.

If you tick, if you shift your
vision horizon only every second,

it can be up to second at least.

Also, consumer itself might wake
up not immediately like you

need listen notify or something.

It's partially supported right
now.

But you need polling or something,
You might lose some milliseconds

there as well And I was thinking
this decision to have Once per

second was made 10 to 20 years
ago.

We have better hardware.

So Let's have 10 per second by
default.

And how?

Okay, pg_cron, which we rely on.

By the way, pg_cron is optional.

You can put it to cron or something.

You just need select ticker, PgQ
ticker.

That's it.

Ah, okay.

Tick, tick, yeah.

Every second by default was, it
was originally from SkyTools.

It was in the first version.

In second version, I made it 10
times per second.

And it's simple.

In pg_cron, there is a storage
procedure who has a loop with

commit, because we need the separate
transactions actually to

shift the snapshot.

So it's not every...

This is the same misconception
as for backslash watching psql.

It's not every hundred milliseconds.

If it's hundred milliseconds wait
time, The operation itself

has non-zero duration, right?

So roughly it should be fine.

And it ticks 10 times per second,
but not exactly, it's slightly

shifting.

But updating 1 row, it's super
fast.

Yeah.

And it will be even better when
I implement bloat mitigation

for system tables, because this
is why we go from 100 microseconds

to 1 millisecond or so under blocked
xmin horizon, because we

accumulate dead tuples in these
metatables.

Ticker and subscription.

Michael: Are there any other downsides
to increasing the ticker

speed or decreasing the ticker
frequency?

Nikolay: So I did preliminary benchmarks
yesterday and the important

thing to understand, if when it's
ticking, if nothing to do,

nothing to read, to read.

So it doesn't show.

And it means it's great if load
is low, it won't produce new

writes.

But if, imagine every hundred milliseconds
you have new events.

In this case, every time ticking,
it's updating this row, which

has metadata.

And we estimated it for ticking
every second, it's 24 megabytes

per second of fall, it's very rough
because it doesn't take into

account full page rights.

So it's very rough just from ticking
overhead from ticking

Michael: 24 megabytes per second
from a single tick per second.

Nikolay: Oh, per

Michael: day, that makes more sense.

Yeah, that makes

Nikolay: more sense.

240 megabytes per day, sorry, per
day.

If you have right now default in
version 2 to 10 times per second,

Which is acceptable.

I mean, this means you have load
already, right?

So doing this database is loaded
if you if every hundred milliseconds

there is ticking happens.

Michael: If it

Nikolay: doesn't happen again,
no writes.

Michael: So it sounds like it scales
fairly linearly then, like

Nikolay: times more.

This is overhead from updating
this meta table with row where

we are.

That's it.

So of course, if you inject a lot
of data, there's mechanics

there, interesting, might happen.

And again, if you have long-running
transactions, like xmin horizon

blocked, in this case, dead tuples
will accumulate, unfortunately,

in system, in metadata tables,
which I'm going to solve also

with partitioning and truncate.

Michael: So something I don't quite
understand is what, when

doesn't this make sense?

You mentioned it's not really,
it's a queue, it can be used for

queue-like workloads, but maybe
sometimes it doesn't make sense.

Should we go into that a little
bit?

Nikolay: This is a great question.

I'm deep in database.

So I would like to hear from back-end
engineers and people who

build systems, what's lacking here.

1 thing I can understand is lack
of, for example, priorities

for events.

Yeah.

Right.

So, because this is very linear.

With cooperative consumers, I think
the problem of big task blocks,

small tasks will be basically resolved.

But priority, I don't know, this
is definitely not a pattern

here.

Also, if you need the almost immediate
delivery, pgmq or River

might be better because they deliver
faster, right?

We have end-to-end, I mean, end-to-end
latency for job processing.

What we have here is worse end-to-end
latency controlled by this

ticking frequency.

Obviously, I actually wrote a document
about frequency tuning

with some considerations.

There is a docs folder directory,
There is a special document

right now.

So the problem will be...

So if you want almost immediate
delivery, like for example, like

it's I don't know like chat or
something Maybe you should choose

pgmq or River, but you need to
fight xmin horizon blockers very

actively, right?

And install our monitoring and
connect it to our platform and

check the health and so on and
fight those blockers actively.

What I can say is that update-delete,
SKIP LOCKED systems, they

have better end-to-end delivery,
but they degrade badly under

xmin horizon-blocked.

We have worse initially, but it's
predictable, reliable, right?

And in the case, if you have like
background, for example, we

discussed how to convert integer
4 primary key to integer 8 primary

key.

You need, you have, for example,
1000000000 rows, you need to

change them.

And you chose, for example, the
approach I call the new column

approach.

You create a new column with integer
8, and then you need to

install Trigger for future rows
already, and then you need to

process your big backlog, 1000000000
tables.

You do it in batches.

How to schedule this processing?

This is exactly any background
processing where you like 50 millisecond

and that is fine, this is it.

It's good in working batches.

If you need 1 millisecond, okay,
choose new tools, but fight

xmin horizon blockers.

Michael: Yeah.

I mean, my understanding of when
you use queues is it's for asynchronous

stuff.

So I can't, I'm struggling to imagine
something that can't cope

with 50 milliseconds of overhead
on something asynchronous.

Even like a password reset, if
it comes through a second later,

it's

Nikolay: fine.

I agree.

And in this case, maybe you should
consider doing it outside

of Postgres with different systems
like Redpanda or something.

I can imagine some systems where
you need very responsive behavior,

but you need to learn how MVCC
works and what dangers await you

if you build like that.

You will have good latency in the
beginning, but suddenly then

some something blocks you and then
it degrades quickly.

I wanted to mention that queue
in database is great because it's

ACID.

Nothing will be lost, right?

It's like it's replicated, it goes
to backups, nothing is lost

and it's isolation.

All 4 properties are very follow
followed, right?

If you go and use different system
and you need to think about

consistency, right?

So you need to think about if you
have something in database

you already wrote here but didn't
delete that or it can be inconsistent.

These days GitHub works very poorly
And since I posted this project

on GitHub, I worked, I'm usually
on GitLab.

And I know GitLab issues as well,
because they are our clients

many years, but they are great.

On GitHub lately, I like, wow,
it's interesting.

You already merged pull request,
but it takes some seconds for

counter to propagate.

It's also for list to, for this
pull request to disappear.

So they have big lags, synchronous
processing, right?

And, But legs are fine, eventual
consistency, right?

But data loss is not fine.

So if you have data, I would say,
if you need predictable performance,

reliable approach, good throughput,
not suffering from degradation

when xmin horizon is blocked, PgQ
is great.

When you need much faster delivery
and you want to stay inside

database, ACID and so on, choose
different system for Postgres,

but fight xmin horizon blockers.

And if you want better throughput,
go with Redpanda, Kafka or

anything if you can afford supporting
or paying for managed version.

But in this case, do look at Transactional
outbox pattern.

Yeah.

Because this is from microservices
theory, so to speak.

There is a pattern to organize
data delivery from database to

queue properly and all statuses.

This is how you should do it because
otherwise data loss is eventually

inevitable.

Yeah, this is how to navigate solutions,
advice from me.

Yeah.

Michael: What, anything else you
wanted to make sure we covered

before we wrap up?

Nikolay: Well, I'm just excited
that, among others, as you said,

Christophe Pettus and also, as I
said, Kukushkin, we, like, I guess

we teamed up a little bit, not
like somehow in distributed fashion,

to shed a new light at PgQ, because
it's a great piece of software.

It solved problems before people
encountered them, but somehow

it got lost with knowledge.

I hope more people at least keep
in mind what's possible and

consider it building systems.

And telling their AI to consider
because maybe they just, the

AI will look at it, do some benchmarks,
research and make decision.

Right?

That's it.

Michael: Maybe, yeah.

Alright, last 1.

Well, thanks so much, Nikolay,
and catch you soon.

Nikolay: Thank you for listening.

See you soon, bye.