Hi,
I'm running 9.1.6 w/22GB shared buffers, and 32GB overall RAM on a 16 Opteron 6276 CPU box. We limit connections to roughly 120, but our webapp is configured to allocate a thread-local connection, so those connections are rarely doing anything more than half the time.
We have been running smoothly for over a year on this configuration, and recently started having huge CPU spikes that bring the system to its knees. Given that it is a multiuser system, it has been quite hard to pinpoint the exact cause, but I think we've narrowed it down to two data import jobs that were running in semi-long transactions (clusters of row inserts).
The tables affected by these inserts are used in common queries.
The imports will bring in a row count of perhaps 10k on average covering 4 tables.
The insert transactions are at isolation level read committed (the default for the JDBC driver).
When the import would run (again, theory...we have not been able to reproduce), we would end up maxed out on CPU, with a load average of 50 for 16 CPUs (our normal busy usage is a load average of 5 out of 16 CPUs).
When looking at the active queries, most of them are against the tables that are affected by these imports.
Our workaround (that is holding at present) was to drop the transactions on those imports (which is not optimal, but fortunately is acceptable for this particular data). This workaround has prevented any further incidents, but is of course inconclusive.
Does this sound familiar to anyone, and if so, please advise.
Thanks in advance,
Tony Kay