Re: Raid 10 chunksize

Scott Carey <scott@xxxxxxxxxxxxxxxxx> · Tue, 24 Mar 2009 18:48:36 -0700

On 3/24/09 6:09 PM, "Mark Kirkwood" <markir@xxxxxxxxxxxxxxx> wrote:

> I'm trying to pin down some performance issues with a machine where I
> work, we are seeing (read only) query response times blow out by an
> order of magnitude or more at busy times. Initially we blamed
> autovacuum, but after a  tweak of the cost_delay it is *not* the
> problem. Then I looked at checkpoints... and altho there was some
> correlation with them and the query response - I'm thinking that the
> raid chunksize may well be the issue.
> 
> Fortunately there is an identical DR box, so I could do a little
> testing. Details follow:
> 
> Sun 4140 2x quad-core opteron 2356 16G RAM,  6x 15K 140G SAS
> Debian Lenny
> Pg 8.3.6
> 
> The disk is laid out using software (md) raid:
> 
> 4 drives raid 10 *4K* chunksize with database files (ext3 ordered, noatime)
> 2 drives raid 1 with database transaction logs (ext3 ordered, noatime)
> 

> 
> Top looks like:
> 
> Cpu(s):  2.5%us,  1.9%sy,  0.0%ni, 71.9%id, 23.4%wa,  0.2%hi,  0.2%si,
> 0.0%st
> Mem:  16474084k total, 15750384k used,   723700k free,  1654320k buffers
> Swap:  2104440k total,      944k used,  2103496k free, 13552720k cached
> 
> It looks to me like we are maxing out the raid 10 array, and I suspect
> the chunksize (4K) is the culprit. However as this is a pest to change
> (!) I'd like some opinions on whether I'm jumping to conclusions. I'd
> also appreciate comments about what chunksize to use (I've tended to use
> 256K in the past, but what are folks preferring these days?)
> 
> regards
> 
> Mark
> 
> 

md tends to work great at 1MB chunk sizes with RAID 1 or 10 for whatever
reason.  Unlike a hardware raid card, smaller chunks aren't going to help
random i/o as it won't read the whole 1MB or bother caching much.  Make sure
any partitions built on top of md are 1MB aligned if you go that route.
Random I/O on files smaller than 1MB would be affected -- but that's not a
problem on a 16GB RAM server running a database that won't fit in RAM.

Your xlogs are occasionally close to max usage too -- which is suspicious at
10MB/sec.  There is no reason for them to be on ext3 since they are a
transaction log that syncs writes so file system journaling doesn't mean
anything.  Ext2 there will lower the sync times and reduced i/o utilization.

I also tend to use xfs if sequential access is important at all (obviously
not so in pg_bench).  ext3 is slightly safer in a power failure with unsyncd
data, but Postgres has that covered with its own journal anyway so those
differences are irrelevant.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance