Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

Jim Jarvie <jim@xxxxxxxxxxxxxx> · Thu, 20 Aug 2020 17:13:27 -0400



    On 20-Aug.-2020 13:30, Michael Lewis
      wrote:

    
      Great to hear that some of the issues are now mitigated. Though, perhaps
you actually require that ORDER BY if items are expected to be sitting in
the queue quite some time because you have incoming queue items in a burst
pattern and have to play catch up sometimes. If so, I highly suspect the
index on q_id is becoming very bloated and reindex concurrently would help.

    
    I managed to bypass the need for the sort by relying on the active
    feed only sending the oldest items in for processing (it was always
    doing that) but based on some of the earlier e-mails in this thread,
    it prompted the revelation that my order by when processing was
    really pretty pointless because I need more-or-less ordered rather
    than strictly ordered and that was already happening due to how the
    process list was being fed.

    
      I don't know if you might have missed my last message, and the suggestion
from Laurenz to check pgstattuple.

    
    I still need to look at that, but since I had made some progress, I
    got pretty exited and have not got round to this yet.
    
      *
https://www.2ndquadrant.com/en/blog/what-is-select-skip-locked-for-in-postgresql-9-5/

    
    This does warn about the overhead, but I've also upgraded pg_top
      on my system today and saw a useful additional data point that it
      displays - the number of locks held by a process.
    What I see happening is that when the select statements collide,
      they are holding about 10-12 locks each and then begin to very
      slowly acquire more locks every few seconds.  One process will
      grow quicker than others then reach the target (250) and start
      processing.  Then another takes the lead and so on until a
      critical mass is reached and then the remaining all acquire their
      locks in a few seconds.
    I still keep thinking there is some scaling type issue here in
      the locking and possibly due to it being a partitioned table (due
      to that tuple moved error).