I'm seeking help in diagnosing / figuring out the issue that we have with our DB server:
Under some (relatively non-heavy) load: 300...400 TPS, every 10-30 seconds server drops into high cpu system usage (90%+ SYSTEM across all CPUs - it's pure SYS cpu, i.e. it's not io wait, not irq, not user). Postgresql is taking 10-15% at the same time. Those periods would last from few seconds, to minutes or until Postgresql is restarted. Needless to say that system is barely responsive, with load average hitting over 100. We have mostly select statements (joins across few tables), using indexes and resulting in a small number of records returned. Should number of requests per second coming drop a bit, server does not fall into those HIGH-SYS-CPU periods. It all seems like postgres runs out of some resources or fighting for some locks and that causing kernel to go into la-la land trying to manage it.
So far we've checked:
- disk and nic delays / errors / utilization
- WAL files (created rarely)
- tables are vacuumed OK. periods of high SYS not tied to vacuum process.
- kernel resources utilization (sufficient FS handles, shared MEM/SEM, VM)
- increased log level, but nothing suspicious/different (to me) is reported there during periods of high sys-cpu
- ran pgbench (could not reproduce the issue, even though it was producing over 40,000 TPS for prolonged period of time)
Basically, our symptoms are exactly as was reported here over a year ago (though for postgres 8.3, we ran 9.1):
I will be grateful for any ideas helping to resolve or diagnose this problem.
Environment background:
Postgresql 9.1.6.
Postgres usually has 400-500 connected clients, most of them are idle.
Database is over 1000 tables (across 5 namespaces), taking ~150Gb on disk.
Linux 3.5.5 (Fedora 17 x64) on 32Gb RAM / 8 cores
Default configuration changed:
max_connection = 1200
shared_buffers = 3200MB
temp_buffers = 18MB
max_prepared_transactions = 500
work_mem = 16MB
maintenance_work_mem = 64MB
max_files_per_process = 3000
wal_level = hot_standby
fsync = off
checkpoint_segments = 64
checkpoint_timeout = 15min
effective_cache_size = 8GB
default_statistics_target = 500
-- Vlad