On 11/04/2009 01:40 PM, Leslie Rhorer wrote: >> I will preface this by saying I only need about 100MB/s out of my array >> because I access it via a gigabit crossover cable. > > That's certainly within the capabilities of a good setup. > >> I am backing up all of my information right now (~4TB) with the >> intention of re-creating this array with a larger chunk size and >> possibly tweaking the file system a little bit. >> >> My original array was a raid6 of 9 WD caviar black drives, the chunk >> size was 64k. I use USAS-AOC-L8i controllers to address all of my drives >> and the TLER setting on the drives is enabled for 7 seconds. > > I would recommend a larger chunk size. I'm using 256K, and even > 512K or 1024K probably would not be excessive. OK, I've got some data that I'm not quite ready to send out yet, but it maps out the relationship between max_sectors_kb (largest request size a disk can process, which varies based upon scsi host adapter in question, but for SATA adapters is capped at and defaults to 512KB max per request) and chunk size for a raid0 array across 4 disks or 5 disks (I could run other array sizes too, and that's part of what I'm waiting on before sending the data out). The point here being that a raid0 array will show up more of the md/lower layer block device interactions where as raid5/6 would muddy the waters with other stuff. The results of the tests I ran were pretty conclusive that the sweet spot for chunk size is when chunk size is == max_sectors_kb, and since SATA is the predominant thing today and it defaults to 512K, that gives a 512K chunk as the sweet spot. Given that the chunk size is generally about optimizing block device operations at the command/queue level, it should transfer directly to raid5/6 as well. >> storrgie@ALEXANDRIA:~$ sudo mdadm -D /dev/md0 >> /dev/md0: >> Version : 00.90 > > I definitely recommend something other than 0.9, especially if this > array is to grow a lot. > >> I have noticed slow rebuilding time when I first created the array and >> intermittent lockups while writing large data sets. > > Lock-ups are not good. Investigate your kernel log. A write-intent > bitmap is recommended to reduce rebuild time. > >> Is ext4 the ideal file system for my purposes? > > I'm using xfs. YMMV. > >> Should I be investigating into the file system stripe size and chunk >> size or let mkfs choose these for me? If I need to, please be kind to >> point me in a good direction as I am new to this lower level file system >> stuff. > > I don/'t know specifically about ext4, but xfs did a fine job of > assigning stripe and chunk size. xfs pulls this out all on it's own, ext2/3/4 need to be told (and you need very recent ext utils to tell it both stripe and stride sizes). >> Can I change the properties of my file system in place (ext4 or other) >> so that I can tweak the stripe size when I add more drives and grow the >> array? > > One can with xfs. I expect ext4 may be the same. Actually, this needs clarified somewhat. You can tweak xfs in terms of the sunit and swidth settings. This will effect new allocations *only*! All of your existing data will still be wherever it was and if that happens to be not so well laid out for the new array, too bad. For the ext filesystems, they use this information at filesystem creation time to lay out their block groups, inode tables, etc. in such a fashion that they are aligned to individual chunks and also so that they are *not* exactly stripe width apart from each other (which forces the metadata to reside on different disks and avoids the possible pathological case where you could accidentally end up with the metadata blocks always falling on the same disk in the array making that one disk a huge bottleneck to the rest of the array). Once an ext filesystem is created, I don't think it uses the data much any longer, but I could be wrong. However, I know that it won't be rearranged for your new layout, so you get what you get after you grow the fs. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: CFBFF194 http://people.redhat.com/dledford Infiniband specific RPMs available at http://people.redhat.com/dledford/Infiniband
Attachment:
signature.asc
Description: OpenPGP digital signature