On 14 Listopad 2011, 19:16, Cody Caughlan wrote: > shared_buffers = 3584MB > wal_buffers = 16MB > checkpoint_segments = 32 > max_wal_senders = 10 > checkpoint_completion_target = 0.9 > wal_keep_segments = 1024 > maintenance_work_mem = 256MB > work_mem = 88MB > shared_buffers = 3584MB > effective_cache_size = 10GB Seems reasonable, although I'd bump up the checkpoint_timeout (the 5m is usually too low). > The PGDATA dir is a RAID10 on 4 local ("ephemeral" in EC2 speak) > drives. I ran some dd tests and received the following output: > > --- WRITING --- > root@sql03:/data# time sh -c "dd if=/dev/zero of=/data/tmp/bigfile > bs=8k count=4000000 && sync" > 4000000+0 records in > 4000000+0 records out > 32768000000 bytes (33 GB) copied, 670.663 s, 48.9 MB/s > > real 11m52.199s > user 0m2.720s > sys 0m45.330s This measures sequential write performance (and the same holds for the read test). We need to know the random I/O performance too - use bonnie++ or similar tool. Based on the AWS benchmarks I've seen so far, I'd expect about 90 MB/s for sequential read/writes, and about twice that performance for a 4-drive RAID10. So while the reads (211 MB/s) seem perfectly OK, the writes (50MB/s) are rather slow. Have you measured this on an idle system, or when the db was running? See for example this: [1] http://victortrac.com/EC2_Ephemeral_Disks_vs_EBS_Volumes [2] http://www.gabrielweinberg.com/blog/2011/05/raid0-ephemeral-storage-on-aws-ec2.html > I have enabled log_checkpoints and here is a recent sample from the log: > 2011-11-14 17:39:15 UTC pid:3965 (-0) LOG: checkpoint starting: time > 2011-11-14 17:43:49 UTC pid:3965 (-0) LOG: checkpoint complete: wrote > 16462 buffers (3.6%); 0 transaction log file(s) added, 0 removed, 9 > recycled; write=269.978 s, sync=4.106 s, total=274.117 s; sync > files=82, longest=2.943 s, average=0.050 s Nothing special here - this just says that the checkpoints were timed and finished on time (the default checkpoint timeout is 5 minutes, with completion target 0.9 the expected checkpoint time is about 270s). Not a checkpoint issue, probably. > I've been collecting random samples from pg_stat_bgwriter: > https://gist.github.com/4faec2ca9a79ede281e1 Although it's a bit difficult to interpret this (collect the data in regular intervals - e.g. every hour - and post the differences, please), but it seems reasonable. > So given all this information (if you need more just let me know), is > there something fundamentally wrong or mis-configured? Do I have an > I/O issue? Probably - the discrepancy between read/write performance is a bit suspicious. Try to watch the I/O performance when this happens, i.e. run "iostat -x" and watch the output (especially %util, r_await, w_await) and post several lines of the output. Tomas -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance