Processing a work queue

Steve Crawford <scrawford@xxxxxxxxxxxxxxxxxxxx> · Thu, 26 Apr 2007 15:36:35 -0700

Anyone have any ideas on how to handle a work queue? I've been thinking
about optimizing this process for quite a while.

Basically, my queue table consists of a few-hundred-thousand records
describing "things to do". To pare things to the minimum, a queue record
can be considered to have a status (available, assigned, completed), a
priority, and a description-of-work.

A process will grab an item from the queue, mark it as in-progress,
process it, and, depending on success, update the item as completed or
as available with an updated priority. There may be upwards of a
thousand "worker" processes and the work of each process may be
completed in anywhere from a few seconds to nearly an hour. I expect the
system as a whole to be handling a few-dozen queue items per second.

My original plan to fetch work was:

begin;

select item-id, item-info
   from the-queue
   where available
   order by priority
   limit 1
   for update;

update the-queue
  set status = 'assigned'
  where item-id = previously-selected-item-id;

commit;

This does not produce desirable results. In the case where requests for
work overlap, the first query will complete. The second query will block
until the first completes and then apparently re-evaluate the condition
and toss the record thus returning zero-rows.

Plan 1a:

Check for tuples returned and re-run query if zero. This will go into an
infinite loop whenever there is nothing in the queue and cause
undesirable thrashing if there is too much contention.

Plan 2:

Lock the table, run the query/update, unlock the table. Functions fine
but work halts when any operation interferes with obtaining the
table-level lock;

Plan 3:

Same as plan 1 but use a higher limit, say 100, then just choose and
update the first tuple. The second query will block till the first
completes, and then return 99 records. If limit is set to the number of
workers, every request should return some work to be done, if any is
available. It's a kludge, but does anyone see any significant drawbacks?

Plan 4:

Add an intermediary "dispatcher" with which the workers will communicate
via SOAP/XML-RPC/? But if dispatcher is allowed to run multiple
processes we are back to needing to resolving database query issues.

Plan 5:

I could, potentially, reverse everything and have the workers announce
availability and wait for the dispatcher to send work. Fixes the
database issue but creates some others.

So from the standpoint of the database query part, anyone have any
ideas/suggestions on how to handle a work queue?

Cheers,
Steve