Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

Jim Jarvie <jim@xxxxxxxxxxxxxx> · Tue, 18 Aug 2020 20:21:38 -0400

    Thank you for the quick response.
    No adjustments of fill factors.  Hadn't though of that - I'll
      investigate and try some options to see if I can measure an
      effect.

    There is some ordering on the select [ ORDER BY q_id] so each
      block of 250 is sequential-ish queue items; I just need them more
      or less in the order they were queued so as near FIFO as possible
      without being totally strict on absolute sequential order.

    Table has around 192K rows, as a row is processed it is deleted
      as part of the transaction with a commit at the end after all 250
      are processed [partitioned table, state changes and it migrates to
      a different partition] and as the queue drops to 64K it is added
      to with 128K rows at a time.
    I've tuned the LIMIT value both up and down.  As I move the limit
      up, the problem becomes substantially worse; 300 swamps it and the
      selects take > 1 hour to complete; at 600 they just all lock
      everything up and it stops processing.  I did try 1,000 but it
      basically resulted in nothing being processed.

    Less processes does not give the
      throughput required because the queue sends data elsewhere which
      has a long round trip time but does permit over 1K concurrent
      connections as their work-round for throughput.  I'm stuck having
      to scale up my concurrent processes in order to compensate for the
      long processing time of an individual queue item.

    On 18-Aug.-2020 20:08, Michael Lewis
      wrote:

      Message queue...
Are rows deleted? Are they updated once or many times? Have you adjusted
fillfactor on table or indexes? How many rows in the table currently or on
average? Is there any ordering to which rows you update?

It seems likely that one of the experts/code contributors will chime in and
explain about how locking that many rows in that many concurrent
connections means that some resource is overrun and so you are escalating
to a table lock instead of actually truly locking only the 250 rows you
wanted.

On the other hand, you say 80 cores and you are trying to increase the
number of concurrent processes well beyond that without (much) disk I/O
being involved. I wouldn't expect that to perform awesome.

Is there a chance to modify the code to permit each process to lock 1000
rows at a time and be content with 64 concurrent processes?