Re: Select for update with offset interferes with concurrent transactions

Andy Colson <andy@xxxxxxxxxxxxxxx> · Tue, 01 Feb 2011 13:04:31 -0600

On 2/1/2011 12:51 PM, Yngve Nysaeter Pettersen wrote:

Thanks Andy,

On Tue, 01 Feb 2011 19:29:08 +0100, Andy Colson <andy@xxxxxxxxxxxxxxx>
wrote:

On 2/1/2011 12:10 PM, Yngve Nysaeter Pettersen wrote:
On Tue, 01 Feb 2011 18:11:23 +0100, Andy Colson <andy@xxxxxxxxxxxxxxx>
I think the random limit thing is a race condition itself. Whenever
you have multiple processes hitting the same rows you're going to run
into problems. Have you thought of using a sequence instead of a
random limit? Each process could get the next 100 record_id'd via a
sequence, then there would be much less chance of deadlock.

How would that work, in case you would like to provide an example?

I am not really familiar with sequences, as I have only seen them used
for the "id" field in Django generated tables.

In case it is relevant, the processes does not (currently, at least)
have a unique ID; though they have a local sequence number for the
machine they are running on.

I have a really simple q table I use.

create table q (id integer not null, msg integer, primary key(id));
create sequence q_add;
create sequence q_read;

I insert via q_add:

andy=# insert into q(id, msg) values(nextval('q_add'), 20);
INSERT 0 1
andy=# insert into q(id, msg) values(nextval('q_add'), 4);
INSERT 0 1
andy=# select * from q;
id | msg
----+-----
1 | 20
2 | 4
(2 rows)

Then I run multiple batch proc's which get their next job like:

andy=# select msg from q where id = (select nextval('q_read'));
msg
-----
20
(1 row)

andy=# select msg from q where id = (select nextval('q_read'));
msg
-----
4
(1 row)

It works for me because I can empty the q table, reset the q_add and
q_read sequences and start over clean. Not sure if it would work for
your setup.

I see how that would work (it is essentially how Django assigns row ids).

My current setup can have multiple runs configured at a time (and have had
several dozen queued, in one case), with varying priorities on each run,
and they might, at least theoretically, be configured in parallel (even
the individual runs are set up in parallel), meaning the ids would not be
sequential (a sequence is used for the id field in each row of the table),
unless they could somehow be allocated for each individual run/project
(multiple sequence objects, one for each run might be an option, but I
don't like that possibility). And as I mentioned elsewhere in the thread I
might make the queuing a bit more complex, which might make this system
even more complicated.

So, AFAICT I am afraid it would not work in the general case for my
project :( .
However, it might be useful in somebody else's project :) .

No, I didn't think it would work for you, yours looks much more 
complicated than main.  Just out of curiosity, have you looked at PgQ?

-Andy

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general