Re: CPU hogged by concurrent SELECT..FOR UPDATE SKIP LOCKED

Thomas Munro <thomas.munro@xxxxxxxxx> · Fri, 21 Aug 2020 16:34:07 +1200

On Fri, Aug 21, 2020 at 9:58 AM Jim Jarvie <jim@xxxxxxxxxxxxxx> wrote:
> However, as I scale up the number of concurrent connections, I see a spike in CPU (to 100% across 80 cores) when the SELECT FOR UPDATE SKIP LOCKED executes and the select processes wait for multiple minutes (10-20 minutes) before completing.  My use case requires around 256 concurrent processors for the queue but I've been unable to scale beyond 128 without everything grinding to a halt.

Maybe it's just getting algorithmically ugly.  To claim some job rows,
you have to skip all dead/non-matching tuples left behind so far at
the start of the table by all the other sessions, and then also all
currently locked tuples, and you have to do update-chain walks on some
of them too.  It all gets a bit explosive once you have such high
numbers of workers.

I think I'd experiment with splitting the job table up into N tables
and feed jobs into all of them about evenly (by hashing, at random,
whatever), and then I'd assign each consumer a "preferred" table where
it looks for jobs first (perhaps my_worker_id % ntables), before
trying the others in round robin order.  Then they won't trample on
each other's toes so much.

In the past I've wondered about a hypothetical circular_seqscan
option, which would cause table scans to start where they left off
last time in each backend, so SELECT * FROM t LIMIT 1 repeated would
show you a different row each time until you get all the way around to
the start again (as we're entirely within our rights to do for a query
with no ORDER BY).  That'd give the system a chance to vacuum and
start refilling the start of the table before you get around to it
again, instead of repeatedly having to step over the same useless
pages every time you need a new job.  Combined with the N tables
thing, you'd be approaching a sweet spot for contention and dead tuple
avoidance.  The synchronized_seqscans setting is related to this idea,
but more expensive, different, and probably not useful.

Hmm.  I guess another way to avoid colliding with others' work would
be to try to use SELECT * FROM t TABLESAMPLE SYSTEM (10) WHERE ... FOR
UPDATE SKIP LOCKED LIMIT ....  It's less cache-friendly, and less
order-preserving, but way more contention-friendly.  That has another
complication though; how do you pick 10?  And if it doesn't return any
or enough rows, it doesn't mean there isn't enough, so you may need to
be ready to fall back to the plain approach if having 250 rows is
really important to you and TABLESAMPLE doesn't give you enough.  Or
something.

By the way, when working with around 64 consumer processes I was also
annoyed by the thundering herd problem when using NOTIFY.  I found
various workaround solutions to that, but ultimately I think we need
more precise wakeups for that sort of thing, which I hope to revisit
one day.