Re: FW: Queries becoming slow under heavy load

Shaun Thomas <sthomas@xxxxxxxxx> · Wed, 26 Jan 2011 11:16:38 -0600

On 01/26/2011 10:04 AM, Anne Rosset wrote:

We've been able to match  long running database queries to such
processes. This occurs under relatively low load average (say 4 out of
8) and can involve as little as 1 single sql query.

The b state means the process is blocking, waiting for... something. One 
thing you need to consider is far more than your CPU usage. If you have 
the 'sar' utility, run it as 'sar 1 100' just to see how your system is 
working. What you want to watch for is iowait.

If even one query is churning your disks, every single other query that 
has to take even one block from disk instead of cache, is going to 
stall. If you see an iowait of anything greater than 5%, you'll want to 
check further on the device that contains your database with iostat. My 
favorite use of this is 'iostat -dmx [device] 1' where [device] is the 
block device where your data files are, if your WAL is somewhere else.

And yeah, your shared_buffers are kinda on the lowish side. Your 
effective_cache_size is good, but you have a lot more room to increase 
PG-specific memory.

Worse however, is your checkpoints. Lord. Increase checkpoint_segments 
to *at least* 20, and increase your checkpoint_completion_target to 0.7 
or 0.8. Check your logs for checkpoint warnings, and I'll bet it's 
constantly complaining about increasing your checkpoint segments. Every 
checkpoint not started by the scheduled system risks a checkpoint spike, 
which can flood your system with IO regardless of which queries are 
running. That kind of IO storm will ruin your performance, and with only 
3 checkpoint segments on a busy database, are probably happening constantly.

Unfortunately we still need to know more. This is just based on your PG 
settings, and that's not really enough to know how "busy" your DB is. 
One way to check is to log the contents of pg_stat_database, especially 
the xact_commit and xact_rollback columns. Grab those with a timestamp. 
If you get a snapshot of that every minute, you can figure out how many 
queries you're processing per minute or per second pretty easily. We've 
hit 8600 TPS before and don't have nearly the trouble you've been reporting.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@xxxxxxxxx

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance