Re: using a postgres table as a multi-writer multi-updater queue

Jeff Janes <jeff.janes@xxxxxxxxx> · Wed, 25 Nov 2015 17:24:08 -0800

On Mon, Nov 23, 2015 at 2:41 AM, Chris Withers <chris@xxxxxxxxxxxxxxxx> wrote:
> Hi All,
>
> I wondered if any of you could recommend best practices for using a postgres
> table as a queue. Roughly speaking, 100-200 workers will vomit rows and
> rates of a few hundres per second into the table leaving the status as new
> and then as many workers as needed to keep up with the load will plough
> through the queue changing the status to something other than new.

Is that several hundred per second per worker, or just several hundred
per second in total?

What happens if the worker finds the database has crashed when it goes
to insert the records?  That would determine what kind of
transactional system you need.  On the consumer side, what would
happen if a record were processed twice?

>
> My naive implementation would be something along the lines of:
>
> CREATE TABLE event (
>     ts        timestamp,
>     event     char(40),
>     status    char(10),
>     CONSTRAINT pkey PRIMARY KEY(ts, event)
> );

How long are going you keep these records around for once processed?

Unless you delete them immediately, you will probably want a partial
index on (ts) where status='new'

> ...with writers doing INSERT or COPY to get data into the table and readers
> doing something like:
>
> SELECT FOR UPDATE * FROM event WHERE status='new' LIMIT 1000;
>
> ...so, grabbing batches of 1,000, working on them and then setting their
> status.

Why 1000 at a time?  Also, you probably need an ORDER BY

> But, am I correct in thinking that SELECT FOR UPDATE will not prevent
> multiple workers selecting the same rows?

You are correct, but the way it will prevent multiple workers from
selecting them at the same time is that the next worker will block
until the first one commits.  You would either use need to use SKIP
LOCKED in 9.5 release, or you would need to use
pg_try_advisory_xact_lock on lower versions, to avoid that.

And, how do you unlock the rows?  There are two general approaches.
One is to lock the row using PostgreSQL's FOR UPDATE type locks, and
hold the transaction open while processing, then updating the row to
mark it done and committing to release the lock.  This cleans up after
itself in the case a worker crashes, but there is no visibility into
what is going on.

The other is to claim the row for the worker by updating a status
field (for example, to have a hostname and pid), and committing that.
And then doing the processing, then updating it again to set it as
done, and committing that.  An abnormal terminated worker will need to
have someone or something clean up after it, but it gives you much
better visibility into what is happening.

>
> Anyway, is this approach reasonable? If so, what tweaks/optimisations should
> I be looking to make?

The best optimization would be to not do it at all.  Why can't the 200
inserting worker just do the work themselves immediately (either
client side or server side), instead of queuing it?  If there is any
possibility of making the processing fast enough to do it that way,
I'd surely spend my time optimizing the actual work, rather than
optimizing a queuing system.

And depending on what failure modes you can tolerate, consider a
best-effort dedicated queuing system rather than a perfectly ACID one
built on PostgreSQL.

Cheers,

Jeff

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general