Search Postgresql Archives

Select for update with offset interferes with concurrent transactions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello all,

I am in the process of migrating a system from Postgresql 8.3 to 9.0, and have run into a problem with the task queue systems I am using.

The task queue controls the allocation of tasks between about 1000 processes working in parallel, and is essentially a table of

   record_id (unique)
   project_id
   task_description_id
   state (idle, started, finished)

Each project currently have about 2 million entries. My plan is to increase that significantly the next few months.

To avoid having the processes trample each other's queries (the first attempt was to select the first matching entries of the table, which caused one to block all other transactions), one of the steps I took was to select a set of idle rows at a random offset into the table from the project, mark them for update, then update each record's state as started.

SELECT record_id FROM queue WHERE project_id = my_project AND state = idle LIMIT n OFFSET i FOR UPDATE

At present "n" is 100-150, "i" is a random value in the range 0-10000.

There is, intentionally, no ordering specified, since that would just slow down the query, and is not necessary.

For reference, the above query is sent through Django's cursor.execute() call in a manual transaction block.



What I've discovered when using Postgres 9.0 is that the processes are now blocking every other query into this table, apparently reducing the task processing speed by at least a factor of 10, and increasing the load on the server by a similar factor, compared to when Postgres 8.3 was used. The problem is apparent just after starting, with only 50-100 processes active (startup is staggered).

Reducing "n" (and looping), or increasing the "i" range did not work.


The reason seems to be this new part of http://www.postgresql.org/docs/9.0/static/sql-select.html (towards the end of the FOR UPDATE section):

If a LIMIT is used, locking stops once enough rows have been returned to satisfy the limit (but note that rows skipped over by OFFSET will get locked). Similarly, if FOR UPDATE or FOR SHARE is used in a cursor's query, only rows actually fetched or stepped past by the
   cursor will be locked.

I can't find similar text in the 8.3 or 8.4 documentation.

AFAICT, and assuming I have not misunderstood this part of the documentation this means that if one of my processing nodes selects a block of 100 entries at offset 8000 in the resulting table, then every other node will be blocked while the block is being processed, not just the nodes that would have selected the rows in the range 0 to 7999, but also >=8100, because they cannot gain access to the rows.

Also, using FOR SHARE does not seem to solve the problem.

IMO, as a database non-expert, locking rows that were not returned as a result of the query is a bug. As an example, if a query selects the X last items in the matching rows, that is equivalent to locking the table, or the relevant part of it, even if the requester have no intention to modify those other rows.


Is there any way to avoid this problem? Or do I have to add a random batch_id field to the queue table in order to separate the processes' queries so that they do not block each other (as frequently)?

Is it possible to disable the source code causing this (that is, reverting the patch that introduced the problem, or changing a configuration switch)?


--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer		     Email: yngve@xxxxxxxxx
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux