Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

Raj <raj@xxxxxxxx> · Mon, 7 Sep 2020 14:04:58 -0400

Hi all, especially Jim Jarvie, I saw your email to me only now on my 
related issue. My issue remains this one:

Well this got interesting  - the already moved error showed up: 
and I have already gone through all those index pruning and all that 
good stuff.

I remain with my original question from 30th of June, to me it feels 
like a bug of some sort:

"tuple to be locked was already moved to another partition due to 
concurrent update"

This would not exactly look like a bug, because the message says "to 
be locked", so at least it's not allowing two workers to lock the same 
tuple. But it seems that the skip-locked mode should not make an error 
out of this, but treat it as the tuple was already locked. Why would 
it want to lock the tuple (representing the job) if another worker has 
already finished his UPDATE of the job to mark it as "done" (which is 
what makes the tuple move to the "completed" partition.)

Either the SELECT for jobs to do returned a wrong tuple, which was 
already updated, or there is some lapse in the locking.

Either way it would seem to be a waste of time throwing all these 
errors when the tuple should not even have been selected for update 
and locking.

I wonder if anybody knows anything about that issue? Of course you'll 
want to see the DDL and SQL queries, etc. but you can't really try it 
out unless you do some massively parallel magic.

I still think that it should simply not happen. Don't waste time on old 
tuples trying to fetch and lock something that's no longer there. It's a 
waste of resources.

regards,
-Gunther

On 8/20/2020 6:39 PM, Jim Jarvie wrote:

On 20-Aug.-2020 17:42, Michael Lewis wrote:
Can you share an explain analyze for the query that does the select for
update? I wouldn't assume that partition pruning is possible at all with
hash, and it would be interesting to see how it is finding those rows.

Well this got interesting  - the already moved error showed up:  Note, 
the actual process partitions are regular table partitions, these are 
not hashed.  Only the incoming and completed are hashed due to row 
counts at either end of the processing; in flight (where the issue 
shows up) is quite small:

[queuedb] # explain analyze select queueid,txobject,objectid,state 
from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') 
and txobject = 'ticket' limit 250 for update skip locked;
ERROR:  40001: tuple to be locked was already moved to another 
partition due to concurrent update
LOCATION:  heapam_tuple_lock, heapam_handler.c:405
Time: 579.131 ms
[queuedb] # explain analyze select queueid,txobject,objectid,state 
from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') 
and txobject = 'ticket' limit 250 for update skip locked;
ERROR:  40001: tuple to be locked was already moved to another 
partition due to concurrent update
LOCATION:  heapam_tuple_lock, heapam_handler.c:405
Time: 568.008 ms
[queuedb] # explain analyze select queueid,txobject,objectid,state 
from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') 
and txobject = 'ticket' limit 250 for update skip locked;
        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..25.71 rows=250 width=34) (actual 
time=1306.041..1306.338 rows=250 loops=1)
   ->  LockRows  (cost=0.00..7934.38 rows=77150 width=34) (actual 
time=1306.040..1306.315 rows=250 loops=1)
         ->  Append  (cost=0.00..7162.88 rows=77150 width=34) (actual 
time=520.685..1148.347 rows=31500 loops=1)
               ->  Seq Scan on queue_tx_active  (cost=0.00..6764.50 
rows=77000 width=34) (actual time=520.683..1145.258 rows=31500 loops=1)
                     Filter: ((txobject = 'ticket'::mq.queue_object) 
AND ((state = 'tx_active'::mq.tx_state) OR (state = 
'tx_fail_retryable'::mq.tx_state)))
               ->  Seq Scan on queue_tx_fail_retryable 
 (cost=0.00..12.62 rows=150 width=34) (never executed)
                     Filter: ((txobject = 'ticket'::mq.queue_object) 
AND ((state = 'tx_active'::mq.tx_state) OR (state = 
'tx_fail_retryable'::mq.tx_state)))
 Planning Time: 0.274 ms
 Execution Time: 1306.380 ms
(9 rows)

Time: 1317.150 ms (00:01.317)
[queuedb] #