I've run an EXPLAIN ANALYZE on one of the queries that appeared in the pg_locks (although like you say that might be a red herring) both during normal response times (2) and also after the locks backlog materialized (1)The output below, I've just blanked out some columns. The IO timings do seem an order of magnitude slower but not excessive unless I'm reading it wrong."Limit (cost=2364.19..2365.44 rows=500 width=177) (actual time=6501.103..6507.196 rows=500 loops=1)"" Output:" Buffers: shared hit=7163 read=137"" I/O Timings: read=107.771"
...
"Limit (cost=2366.57..2367.82 rows=500 width=177) (actual time=73.284..76.296 rows=500 loops=1)"" Output: various columns"" Buffers: shared hit=6738 read=562"" I/O Timings: read=19.212"
You are correct that the different in IO timing for reads is not nearly enough to explain the difference, but the ratio is still large enough to perhaps be suggestive. It could be be that all the extra time is spent in IO writes (not reported here). If you turn on track_io_timing on system-wide you could check the write times in pg_stat_database.
(Write time has an attribution problem. I need to make room for my data, so I write out someone else's. Is the time spent attributed to the one doing the writing, or the one who owns the data written?)
But it is perhaps looking like it might not be IO at all, but rather some kind of internal kernel problem, such as the "zone reclaim" and "huge pages" and memory interleaving, which have been discussed elsewhere in this list for high CPU high RAM machines. I would summarize it for you, but I don't understand it, and don't have ready access to machines with 64 CPUs and 128 GB of RAM in order to explore it for myself.
But if that is the case, then using a connection pooler to restrict the number of simultaneously active connections might actually be a big win (despite what I said previously).
Cheers,
Jeff