> One of my clients has an odd problem. Every so often a backend will suddenly
> become very slow. The odd thing is that once this has happened it remains
> slowed down, for all subsequent queries. Zone reclaim is off. There is no IO
> or CPU spike, no checkpoint issues or stats timeouts, no other symptom that
> we can see.
By "no spike", do you mean that the system as a whole is not using an unusual amount of IO or CPU, or that this specific slow back-end is not using an unusual amount?
Could you strace is and see what it is doing?
> The problem was a lot worse that it is now, but two steps have
> alleviated it mostly, but not completely: much less aggressive autovacuuming
> and reducing the maximum lifetime of backends in the connection pooler to 30
> minutes.
Do you have a huge number of tables? Maybe over the course of a long-lived connection, it touches enough tables to bloat the relcache / syscache. I don't know how the autovac would be involved in that, though.
Cheers,
Jeff