Re: RAID tuning?

Gordon Henderson <gordon@xxxxxxxxxx> · Tue, 13 Jun 2006 11:19:05 +0100 (BST)

On Mon, 12 Jun 2006, Adam Talbot wrote:

> RAID tuning?
> Just got my new array setup running RAID 6 on 6 disks.  Now I am looking
> to tune it.  I am still testing and playing with it, so I dont mind
> rebuild the array a few times.
>
> Is chunk size per disk or is it total stripe?

As I understand it, (raid0,5&6,?) the chunk size is the amount of data the
system will write to one disk before moving to the next disk. It probably
affects all sorts of underlying "stuff" like buffers at the block level
and so on. I guess the theory is that if you can fill up a chunk on N
disks, and the kernel+hardware is capable of writing them in parallel (or
as near as it can) then the overall write speed increases.

I also think that the oder you give the drives to mdadm makes a difference
- if they are on differnet controllers, and I asked this question
recently, but didn't get any answers to it... An array I created this way
recently, 2 SCSI chains, 7 drives on each, created using sda, sdh, sdb,
sdi, etc. seems to come out as sda, sdb, sdc in /proc/mdstat, so who
knows!

> Will using mkfs.ext3 -b N make a performance difference?

I've found that it does - if you get the right numbers, and not so much
the -b option, (thats probabably optimal for most systems) but the -R
stride=N option, as then ext3 can "know" about the chunk size and
hopefully optimise its writes against that.

A lot also depends on your usage paterns. If you're doing a lot of stuff
with small files, or small writes, then a smaller (or the default) chunk
size might be just fine - for streaming larger files, then a larger chunk
size (& more memory ;-) might help - since you've got time to play here, I
suggest you do :)

One thing I did fine some time back though (and I never tracked it down,
as I didn't have the time, alas)  was some data corruption when using
anything other than the default strip size (or more correctly, the default
-R stride= option in mkfs.ext3 - I think it might have been ext3 rather
than the md stuff - this was an older server with older ext2 tools and a
2.6 kernel) I put together a quick & dirty little script to test this
though - see

  http://lion.drogon.net/diskCheck1

I usually run this a few times on a new array, and will leave it running
for as long as possible on all partitions if I can. Under ext3, I'll fill
the disk with ramdom files, unmount it, force an fsck on it, then
mount/delete all files/umount it, then do the fsck again. That script It
basically creates a random file, then copies this all over the disk,
copying the copy each time.

As for more tuning, & ext3, there is a section in the (rather old now!)
HowTo about the numbers.

  http://www.tldp.org/HOWTO/Software-RAID-HOWTO-5.html#ss5.11

> Are there any other things that I should keep in mind?

Test it, thrash it, power-kill it, remove a drive randomly, fsck it,
rebuild it, etc. before it goes live. Be as intimately familiar with the
"administration"  side of things as you can, so thant if something does go
wrong, you can calmly analyse whats wrong, remove a drive (if neccessary)
and install & resync/build a new one if required.

I've found it does help to do an fsck on it with it's reasonably full of
files - if nothing else, it'll give you an indication of how long it will
take should you ever have to do it for real. If your dealing with
clients/managers/users, etc. it's always good to give them an idea, then
you can go off and have a relaxing coffee or 2 ;-)

And FWIW: I've been using RAID-6 on production servers for some time now
and have been quite happy. It did save the day once when someone performed
a memory upgrade on one server and somehow managed to boot the server with
2 drives missing )-: A reboot after a cable-check and a resync later and
all was fine. (Is it just me, or are SATA cables and connectors rubbish?)

Heres one I built earlier: (SCSI drives)

md9 : active raid6 sdn1[13] sdm1[11] sdl1[9] sdk1[7] sdj1[5] sdi1[3]
	sdh1[1] sdg1[12] sdf1[10] sde1[8] sdd1[6] sdc1[4] sdb1[2] sda1[0]
	3515533824 blocks level 6, 128k chunk, algorithm 2 [14/14]
	[UUUUUUUUUUUUUU]

Filesystem            Size  Used Avail Use% Mounted on
/dev/md9              3.3T  239G  3.1T   8% /mounts/pdrive

It's actually running XFS rather than ext3 - I did manage to have some
testing with this, and the only thing that swayed me towards XFS for this
box, was it's ability to delete files much much quicker than ext3 (and
this box might well require large quantities of files/directories) to be
deleted on a semi-regular basis. I used:

   mkfs -t xfs -f -d su=128k,sw=14 /dev/md9

to create it - I *think* those parameters are OK for a 128K chunk size
array - it was hard to pin this down, but performance seems adequate.
(it's a write by one process, once or twice a day, read by many sort of
data store)

I use stock Debian 3.1 & the mdadm that comes with it, but a custom
compiled 2.6.16.x kernel.

Good luck!

Gordon
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html