Thank you for the quick response.
No adjustments of fill factors. Hadn't though of that - I'll
investigate and try some options to see if I can measure an
effect.
There is some ordering on the select [ ORDER BY q_id] so each
block of 250 is sequential-ish queue items; I just need them more
or less in the order they were queued so as near FIFO as possible
without being totally strict on absolute sequential order.
Table has around 192K rows, as a row is processed it is deleted as part of the transaction with a commit at the end after all 250 are processed [partitioned table, state changes and it migrates to a different partition] and as the queue drops to 64K it is added to with 128K rows at a time.
I've tuned the LIMIT value both up and down. As I move the limit
up, the problem becomes substantially worse; 300 swamps it and the
selects take > 1 hour to complete; at 600 they just all lock
everything up and it stops processing. I did try 1,000 but it
basically resulted in nothing being processed.
Message queue... Are rows deleted? Are they updated once or many times? Have you adjusted fillfactor on table or indexes? How many rows in the table currently or on average? Is there any ordering to which rows you update? It seems likely that one of the experts/code contributors will chime in and explain about how locking that many rows in that many concurrent connections means that some resource is overrun and so you are escalating to a table lock instead of actually truly locking only the 250 rows you wanted. On the other hand, you say 80 cores and you are trying to increase the number of concurrent processes well beyond that without (much) disk I/O being involved. I wouldn't expect that to perform awesome. Is there a chance to modify the code to permit each process to lock 1000 rows at a time and be content with 64 concurrent processes?