On 12/26/2012 11:03 PM, Jeff Janes wrote:
On Fri, Dec 14, 2012 at 10:40 AM, Andrew Dunstan
<andrew.dunstan@xxxxxxxxxxxxx> wrote:
> One of my clients has an odd problem. Every so often a backend will
suddenly
> become very slow. The odd thing is that once this has happened it
remains
> slowed down, for all subsequent queries. Zone reclaim is off. There
is no IO
> or CPU spike, no checkpoint issues or stats timeouts, no other
symptom that
> we can see.
By "no spike", do you mean that the system as a whole is not using an
unusual amount of IO or CPU, or that this specific slow back-end is
not using an unusual amount?
both, really.
Could you strace is and see what it is doing?
Not very easily, because it's a pool connection and we've lowered the
pool session lifetime as part of the amelioration :-) So it's not
happening very much any more.
> The problem was a lot worse that it is now, but two steps have
> alleviated it mostly, but not completely: much less aggressive
autovacuuming
> and reducing the maximum lifetime of backends in the connection
pooler to 30
> minutes.
Do you have a huge number of tables? Maybe over the course of a
long-lived connection, it touches enough tables to bloat the relcache
/ syscache. I don't know how the autovac would be involved in that,
though.
Yes, we do indeed have a huge number of tables. This seems a plausible
thesis.
cheers
andrew
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance