On 3/24/09 6:09 PM, "Mark Kirkwood" <markir@xxxxxxxxxxxxxxx> wrote: > I'm trying to pin down some performance issues with a machine where I > work, we are seeing (read only) query response times blow out by an > order of magnitude or more at busy times. Initially we blamed > autovacuum, but after a tweak of the cost_delay it is *not* the > problem. Then I looked at checkpoints... and altho there was some > correlation with them and the query response - I'm thinking that the > raid chunksize may well be the issue. > > Fortunately there is an identical DR box, so I could do a little > testing. Details follow: > > Sun 4140 2x quad-core opteron 2356 16G RAM, 6x 15K 140G SAS > Debian Lenny > Pg 8.3.6 > > The disk is laid out using software (md) raid: > > 4 drives raid 10 *4K* chunksize with database files (ext3 ordered, noatime) > 2 drives raid 1 with database transaction logs (ext3 ordered, noatime) > > > Top looks like: > > Cpu(s): 2.5%us, 1.9%sy, 0.0%ni, 71.9%id, 23.4%wa, 0.2%hi, 0.2%si, > 0.0%st > Mem: 16474084k total, 15750384k used, 723700k free, 1654320k buffers > Swap: 2104440k total, 944k used, 2103496k free, 13552720k cached > > It looks to me like we are maxing out the raid 10 array, and I suspect > the chunksize (4K) is the culprit. However as this is a pest to change > (!) I'd like some opinions on whether I'm jumping to conclusions. I'd > also appreciate comments about what chunksize to use (I've tended to use > 256K in the past, but what are folks preferring these days?) > > regards > > Mark > > md tends to work great at 1MB chunk sizes with RAID 1 or 10 for whatever reason. Unlike a hardware raid card, smaller chunks aren't going to help random i/o as it won't read the whole 1MB or bother caching much. Make sure any partitions built on top of md are 1MB aligned if you go that route. Random I/O on files smaller than 1MB would be affected -- but that's not a problem on a 16GB RAM server running a database that won't fit in RAM. Your xlogs are occasionally close to max usage too -- which is suspicious at 10MB/sec. There is no reason for them to be on ext3 since they are a transaction log that syncs writes so file system journaling doesn't mean anything. Ext2 there will lower the sync times and reduced i/o utilization. I also tend to use xfs if sequential access is important at all (obviously not so in pg_bench). ext3 is slightly safer in a power failure with unsyncd data, but Postgres has that covered with its own journal anyway so those differences are irrelevant. -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance