Tom writes: > =?utf-8?Q?Adam_Sj=C3=B8gren?= <asjo@xxxxxxxxxxxx> writes: >> Tom writes: >>> ... which implies that the problem is unexpectedly high contention for the >>> ProcArrayLock. > >> One thing I should have mentioned, but forgot, is that the database is >> configured to do logical replication to another machine - could that >> have an effect on the lock contention? >> A colleague pointed out that in the pg_locks output, the replication >> processes are waiting on the ProcArray lock: >> · https://koldfront.dk/misc/postgresql/pg_stat_activity_pg_locks-BAD.txt.gz > > Yeah, that is *mighty* interesting. For the archives' sake, I'll point > out that this shows 16 walsender processes, of which 6 are in > WalSenderWaitForWAL waits (ie, basically idle) and all of the other 10 > are waiting for the ProcArrayLock. Sorry for not returning on this before now. I have seen the slow login problem - ProcArrayLock contention - happen with replication turned off, so I think that can be ruled out as the cause. I just observed something interesting right now, however. Just to recap the scene: the basic setup is that we have a table with a queue of jobs to be processed, and we have a number of programs (say 10) on 6 other machines grabbing those jobs, doing calculations and reporting back. The number of jobs at any given time can fluctuate from none to millions. Typically millions of jobs take some days to a week to get through. Now, I happened to have a psql prompt open when the problem of new logins being slow appeared - and I ran a query that counted the number of jobs by job-type. Before the slow login problem, that query took around 70-80 ms, but when the slow login appeared, it took 20+ seconds to run the query. EXPLAIN ANALYZE showed that it was running a parallel query. So I tried turning parallel queries off - with SET max_parallel_workers_per_gather = 0; - and now the query took around 150 ms consistenly, despite logins still being (variably) slow. So that confirms the ProcArrayLock contention, I think. My next thought is: I have 6*10 programs making connections and doing parallel queries over the job queue, to get new jobs. If the jobs happen to be very "easy", those programs will be doing a lot of queries for jobs. And when the queue is large enough (I'm guessing), those queries will be parallel, so they will be creating new processes that need to be added and removed from the ProcArray. So am I shooting myself in the foot here, by creating lock contention when trying to get jobs quickly from a largeish table? I think my next step will be to modify the programs that grab the jobs to avoid parallel queries when looking up the next job to run. You anticipated this, it just took me until now to get a little closer: > It's also evident that there's some parallel query activity going on, > as some sessions are in BgWorkerShutdown waits; so parallel worker > entry/exit might be contributing to the overall load on ProcArrayLock. Best regards, Adam -- "That's one of the remarkable things about life. Adam Sjøgren It's never so bad that it can't get worse." asjo@xxxxxxxxxxxx