On 01/26/2011 10:04 AM, Anne Rosset wrote:
We've been able to match long running database queries to such processes. This occurs under relatively low load average (say 4 out of 8) and can involve as little as 1 single sql query.
The b state means the process is blocking, waiting for... something. One thing you need to consider is far more than your CPU usage. If you have the 'sar' utility, run it as 'sar 1 100' just to see how your system is working. What you want to watch for is iowait.
If even one query is churning your disks, every single other query that has to take even one block from disk instead of cache, is going to stall. If you see an iowait of anything greater than 5%, you'll want to check further on the device that contains your database with iostat. My favorite use of this is 'iostat -dmx [device] 1' where [device] is the block device where your data files are, if your WAL is somewhere else.
And yeah, your shared_buffers are kinda on the lowish side. Your effective_cache_size is good, but you have a lot more room to increase PG-specific memory.
Worse however, is your checkpoints. Lord. Increase checkpoint_segments to *at least* 20, and increase your checkpoint_completion_target to 0.7 or 0.8. Check your logs for checkpoint warnings, and I'll bet it's constantly complaining about increasing your checkpoint segments. Every checkpoint not started by the scheduled system risks a checkpoint spike, which can flood your system with IO regardless of which queries are running. That kind of IO storm will ruin your performance, and with only 3 checkpoint segments on a busy database, are probably happening constantly.
Unfortunately we still need to know more. This is just based on your PG settings, and that's not really enough to know how "busy" your DB is. One way to check is to log the contents of pg_stat_database, especially the xact_commit and xact_rollback columns. Grab those with a timestamp. If you get a snapshot of that every minute, you can figure out how many queries you're processing per minute or per second pretty easily. We've hit 8600 TPS before and don't have nearly the trouble you've been reporting.
-- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 sthomas@xxxxxxxxx ______________________________________________ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance