Search Postgresql Archives

Re: Select for update with offset interferes with concurrent transactions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@xxxxxxxxxxxxxxx> wrote:

On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote:
Hi,

Thanks for the quick answer, Andy.

On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@xxxxxxxxxxxxxxx>
wrote:

<snip>
So, if I understand correctly, you:

q = SELECT record_id FROM queue
WHERE project_id = my_project AND state = idle
LIMIT n OFFSET i FOR UPDATE
while not q.eof
update queue set state = started where record_id = x;
process record_id
update queue set state = finsihed where record_id = x;
q.next;

Almost, the update to "started" is done for all selected elements first,
releasing the lock, then the items are processed one at a time, marking
each "finished" as they complete. (each processing step can take
minutes, so keeping a lock the whole time is not an option)

Might I suggest and alternative:

q = update queue set state = started
WHERE project_id = my_project AND state = idle
LIMIT n OFFSET i
RETURNING project_id;
idlist = @q;
commit;

foreach x in idlist
process record_id
begin
update queue set state = finsihed where record_id = x;
commit;

Forgive the part perl part python sudocode. Oh, and I've never done
this, no idea if it actually works. :-)

Thanks for that suggestion, I'll take a look at it.

While I hadn't caught on to the "RETURNING" part, I had been wondering
if using a single step UPDATE might be a solution. One concern I have is
how concurrent updates will affect the returned list (or if they will
just be skipped, as SELECT would in normal transaction mode, if I
understood correctly), or whether it might return with an error code (I
know that the normal update return value is the number of updated items,
just not sure if that applies for "RETURNING").

Although, I will note that this process (if it works) will, sort of,
make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy
might cause issues for concurrent updates for the use-cases where FOR
UPDATE is relevant.


Yeah, I'd wondered the same thing. It could be two updates hitting the same row will deadlock, or maybe not, I'm not sure. But I think its the same as with the select, if you happen to have two limits that hit the same range, you're in trouble.

I think the random limit thing is a race condition itself. Whenever you have multiple processes hitting the same rows you're going to run into problems. Have you thought of using a sequence instead of a random limit? Each process could get the next 100 record_id'd via a sequence, then there would be much less chance of deadlock.

How would that work, in case you would like to provide an example?

I am not really familiar with sequences, as I have only seen them used for the "id" field in Django generated tables.

In case it is relevant, the processes does not (currently, at least) have a unique ID; though they have a local sequence number for the machine they are running on.


--
Sincerely,
Yngve N. Pettersen
********************************************************************
Senior Developer		     Email: yngve@xxxxxxxxx
Opera Software ASA                   http://www.opera.com/
Phone:  +47 23 69 32 60              Fax:    +47 23 69 24 01
********************************************************************

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux