On Fri, Aug 21, 2020 at 9:58 AM Jim Jarvie <jim@xxxxxxxxxxxxxx> wrote: > However, as I scale up the number of concurrent connections, I see a spike in CPU (to 100% across 80 cores) when the SELECT FOR UPDATE SKIP LOCKED executes and the select processes wait for multiple minutes (10-20 minutes) before completing. My use case requires around 256 concurrent processors for the queue but I've been unable to scale beyond 128 without everything grinding to a halt. Maybe it's just getting algorithmically ugly. To claim some job rows, you have to skip all dead/non-matching tuples left behind so far at the start of the table by all the other sessions, and then also all currently locked tuples, and you have to do update-chain walks on some of them too. It all gets a bit explosive once you have such high numbers of workers. I think I'd experiment with splitting the job table up into N tables and feed jobs into all of them about evenly (by hashing, at random, whatever), and then I'd assign each consumer a "preferred" table where it looks for jobs first (perhaps my_worker_id % ntables), before trying the others in round robin order. Then they won't trample on each other's toes so much. In the past I've wondered about a hypothetical circular_seqscan option, which would cause table scans to start where they left off last time in each backend, so SELECT * FROM t LIMIT 1 repeated would show you a different row each time until you get all the way around to the start again (as we're entirely within our rights to do for a query with no ORDER BY). That'd give the system a chance to vacuum and start refilling the start of the table before you get around to it again, instead of repeatedly having to step over the same useless pages every time you need a new job. Combined with the N tables thing, you'd be approaching a sweet spot for contention and dead tuple avoidance. The synchronized_seqscans setting is related to this idea, but more expensive, different, and probably not useful. Hmm. I guess another way to avoid colliding with others' work would be to try to use SELECT * FROM t TABLESAMPLE SYSTEM (10) WHERE ... FOR UPDATE SKIP LOCKED LIMIT .... It's less cache-friendly, and less order-preserving, but way more contention-friendly. That has another complication though; how do you pick 10? And if it doesn't return any or enough rows, it doesn't mean there isn't enough, so you may need to be ready to fall back to the plain approach if having 250 rows is really important to you and TABLESAMPLE doesn't give you enough. Or something. By the way, when working with around 64 consumer processes I was also annoyed by the thundering herd problem when using NOTIFY. I found various workaround solutions to that, but ultimately I think we need more precise wakeups for that sort of thing, which I hope to revisit one day.