On 2/1/2011 10:59 AM, Yngve Nysaeter Pettersen wrote:
Hi,
Thanks for the quick answer, Andy.
On Tue, 01 Feb 2011 16:19:17 +0100, Andy Colson <andy@xxxxxxxxxxxxxxx>
wrote:
<snip>
So, if I understand correctly, you:
q = SELECT record_id FROM queue
WHERE project_id = my_project AND state = idle
LIMIT n OFFSET i FOR UPDATE
while not q.eof
update queue set state = started where record_id = x;
process record_id
update queue set state = finsihed where record_id = x;
q.next;
Almost, the update to "started" is done for all selected elements first,
releasing the lock, then the items are processed one at a time, marking
each "finished" as they complete. (each processing step can take
minutes, so keeping a lock the whole time is not an option)
Might I suggest and alternative:
q = update queue set state = started
WHERE project_id = my_project AND state = idle
LIMIT n OFFSET i
RETURNING project_id;
idlist = @q;
commit;
foreach x in idlist
process record_id
begin
update queue set state = finsihed where record_id = x;
commit;
Forgive the part perl part python sudocode. Oh, and I've never done
this, no idea if it actually works. :-)
Thanks for that suggestion, I'll take a look at it.
While I hadn't caught on to the "RETURNING" part, I had been wondering
if using a single step UPDATE might be a solution. One concern I have is
how concurrent updates will affect the returned list (or if they will
just be skipped, as SELECT would in normal transaction mode, if I
understood correctly), or whether it might return with an error code (I
know that the normal update return value is the number of updated items,
just not sure if that applies for "RETURNING").
Although, I will note that this process (if it works) will, sort of,
make FOR UPDATE redundant. Or, if it doesn't, the current lock-policy
might cause issues for concurrent updates for the use-cases where FOR
UPDATE is relevant.
Yeah, I'd wondered the same thing. It could be two updates hitting the
same row will deadlock, or maybe not, I'm not sure. But I think its the
same as with the select, if you happen to have two limits that hit the
same range, you're in trouble.
I think the random limit thing is a race condition itself. Whenever you
have multiple processes hitting the same rows you're going to run into
problems. Have you thought of using a sequence instead of a random
limit? Each process could get the next 100 record_id'd via a sequence,
then there would be much less chance of deadlock.
-Andy
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general