Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all, especially Jim Jarvie, I saw your email to me only now on my related issue. My issue remains this one:

Well this got interesting  - the already moved error showed up:
and I have already gone through all those index pruning and all that good stuff.

I remain with my original question from 30th of June, to me it feels like a bug of some sort:

"tuple to be locked was already moved to another partition due to concurrent update"

This would not exactly look like a bug, because the message says "to be locked", so at least it's not allowing two workers to lock the same tuple. But it seems that the skip-locked mode should not make an error out of this, but treat it as the tuple was already locked. Why would it want to lock the tuple (representing the job) if another worker has already finished his UPDATE of the job to mark it as "done" (which is what makes the tuple move to the "completed" partition.)

Either the SELECT for jobs to do returned a wrong tuple, which was already updated, or there is some lapse in the locking.

Either way it would seem to be a waste of time throwing all these errors when the tuple should not even have been selected for update and locking.

I wonder if anybody knows anything about that issue? Of course you'll want to see the DDL and SQL queries, etc. but you can't really try it out unless you do some massively parallel magic.

I still think that it should simply not happen. Don't waste time on old tuples trying to fetch and lock something that's no longer there. It's a waste of resources.

regards,
-Gunther

On 8/20/2020 6:39 PM, Jim Jarvie wrote:


On 20-Aug.-2020 17:42, Michael Lewis wrote:
Can you share an explain analyze for the query that does the select for
update? I wouldn't assume that partition pruning is possible at all with
hash, and it would be interesting to see how it is finding those rows.

Well this got interesting  - the already moved error showed up:  Note, the actual process partitions are regular table partitions, these are not hashed.  Only the incoming and completed are hashed due to row counts at either end of the processing; in flight (where the issue shows up) is quite small:

[queuedb] # explain analyze select queueid,txobject,objectid,state from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') and txobject = 'ticket' limit 250 for update skip locked; ERROR:  40001: tuple to be locked was already moved to another partition due to concurrent update
LOCATION:  heapam_tuple_lock, heapam_handler.c:405
Time: 579.131 ms
[queuedb] # explain analyze select queueid,txobject,objectid,state from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') and txobject = 'ticket' limit 250 for update skip locked; ERROR:  40001: tuple to be locked was already moved to another partition due to concurrent update
LOCATION:  heapam_tuple_lock, heapam_handler.c:405
Time: 568.008 ms
[queuedb] # explain analyze select queueid,txobject,objectid,state from mq.queue where (state = 'tx_active' or state='tx_fail_retryable') and txobject = 'ticket' limit 250 for update skip locked;
        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..25.71 rows=250 width=34) (actual time=1306.041..1306.338 rows=250 loops=1)    ->  LockRows  (cost=0.00..7934.38 rows=77150 width=34) (actual time=1306.040..1306.315 rows=250 loops=1)          ->  Append  (cost=0.00..7162.88 rows=77150 width=34) (actual time=520.685..1148.347 rows=31500 loops=1)                ->  Seq Scan on queue_tx_active  (cost=0.00..6764.50 rows=77000 width=34) (actual time=520.683..1145.258 rows=31500 loops=1)                      Filter: ((txobject = 'ticket'::mq.queue_object) AND ((state = 'tx_active'::mq.tx_state) OR (state = 'tx_fail_retryable'::mq.tx_state)))                ->  Seq Scan on queue_tx_fail_retryable  (cost=0.00..12.62 rows=150 width=34) (never executed)                      Filter: ((txobject = 'ticket'::mq.queue_object) AND ((state = 'tx_active'::mq.tx_state) OR (state = 'tx_fail_retryable'::mq.tx_state)))
 Planning Time: 0.274 ms
 Execution Time: 1306.380 ms
(9 rows)

Time: 1317.150 ms (00:01.317)
[queuedb] #






[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux