Workqueue performance

Jason Armstrong <ja@xxxxxxxxxxxxxx> · Tue, 11 May 2010 10:28:33 +0200

I have a  performance question with a database workqueue I am working on.

I have two tables, the first containing information about files, and
the second is a workqueue table with 'jobs', intended for a worker
process to transfer the files to another machine:

> create table log.file (id uuid, created timestamp default NOW() not null, filetype_id smallint, process_id smallint, filepath text, UNIQUE (id, filetype_id));
> create table fileworkqueue.job (id uuid, filetype_id smallint, filepath text, attempt smallint not null default 0);
> create index fwq_id_filetype_id_idx on fileworkqueue.job(id, filetype_id);

The log.file table is populated by an application which is writing
files to the filesystem.
The fileworkqueue.job table is populated with a trigger on log.file:

CREATE OR REPLACE FUNCTION log.fileworkqueue_add() RETURNS TRIGGER AS
$$
BEGIN
  INSERT INTO fileworkqueue.job(id, filetype_id, filepath)
    VALUES(NEW.id, NEW.filetype_id, NEW.filepath);

  IF NEW.filetype_id IN (1, 2, 3, 4) THEN
    NOTIFY worker1;
  ELSIF NEW.filetype_id IN (5, 6, 7, 8) THEN
    NOTIFY worker2;
  ELSE
    RAISE EXCEPTION 'Filetype ID not known %', NEW.filetype_id;
  END IF;

  RETURN NULL;
END;
$$ LANGUAGE plpgsql;

My worker processes then 'LISTEN' for the appropriate NOTIFY, select
the rows from the fileworkqueue.job table according to the
'filetype_id', and transfer them. After processing, it deletes the row
from the workqueue.

When we are processing without the workers running (ie just insert
into the log.file table, with the fileworkqueue.job table being filled
up by the trigger), we see a rate of about 3 milliseconds per insert.
When it is run with the workers removing data from the
fileworkqueue.job table, this drops to below 50 Ms.

I have tried various options, such as removing the index on the
workqueue, updating the workqueue table with a flag to indicate
'complete' instead of deleting the row, but I am unable to get better
performance.

Does anyone have any thoughts on how to do this more optimally? I
don't want the workqueue to impact the process adding the data too
much. I am thinking of creating a third table, which will just contain
the 'processed' info (id and filetype_id), so that the workers will
write to this table when the job is complete, and select which data to
process by a view which shows just the entries which are in the
workqueue table. But then I will have to deal with cleaning all these
entries up at some point, which could get complicated in itself.

Thanks for any ideas.

-- 
Jason Armstrong

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general