Re: Successful RAID 6 setup

Beolach <beolach@xxxxxxxxx> · Sat, 7 Nov 2009 23:42:39 -0700

On Sat, Nov 7, 2009 at 11:35, Doug Ledford <dledford@xxxxxxxxxx> wrote:
> On 11/04/2009 01:40 PM, Leslie Rhorer wrote:
>>       I would recommend a larger chunk size.  I'm using 256K, and even
>> 512K or 1024K probably would not be excessive.
>
> OK, I've got some data that I'm not quite ready to send out yet, but it
> maps out the relationship between max_sectors_kb (largest request size a
> disk can process, which varies based upon scsi host adapter in question,
> but for SATA adapters is capped at and defaults to 512KB max per
> request) and chunk size for a raid0 array across 4 disks or 5 disks (I
> could run other array sizes too, and that's part of what I'm waiting on
> before sending the data out).  The point here being that a raid0 array
> will show up more of the md/lower layer block device interactions where
> as raid5/6 would muddy the waters with other stuff.  The results of the
> tests I ran were pretty conclusive that the sweet spot for chunk size is
> when chunk size is == max_sectors_kb, and since SATA is the predominant
> thing today and it defaults to 512K, that gives a 512K chunk as the
> sweet spot.  Given that the chunk size is generally about optimizing
> block device operations at the command/queue level, it should transfer
> directly to raid5/6 as well.
>

This only really applies for large sequential io loads, right?  I seem
to recall
smaller chunk sizes being more effective for smaller random io loads.

>>> Is ext4 the ideal file system for my purposes?
>>
>>       I'm using xfs.  YMMV.
>>

I'm also using XFS, and for now it's maybe safer than ext4 (In the 2.6.32-rc
series, there was recently an obscure bug that could cause ext4 filesystem
corruption - it's fixed now, but that type of thing scares me).  But ext[234]
have forward compatibility w/ btrfs (you can convert an ext4 to a btrfs), so
if you want to use btrfs when it stabalizes, maybe you should go w/ ext3 or
ext4 for now.  But that might take a long time.  XFS can perform better than
ext[234], but if you're really only going to be accessing this over an 1Gib/s
network, it won't matter.

>>> Should I be investigating into the file system stripe size and chunk
>>> size or let mkfs choose these for me? If I need to, please be kind to
>>> point me in a good direction as I am new to this lower level file system
>>> stuff.
>>
>>       I don/'t know specifically about ext4, but xfs did a fine job of
>> assigning stripe and chunk size.
>
> xfs pulls this out all on it's own, ext2/3/4 need to be told (and you
> need very recent ext utils to tell it both stripe and stride sizes).
>
>>> Can I change the properties of my file system in place (ext4 or other)
>>> so that I can tweak the stripe size when I add more drives and grow the
>>> array?
>>
>>       One can with xfs.  I expect ext4 may be the same.
>
> Actually, this needs clarified somewhat.  You can tweak xfs in terms of
> the sunit and swidth settings.  This will effect new allocations *only*!
>  All of your existing data will still be wherever it was and if that
> happens to be not so well laid out for the new array, too bad.  For the
> ext filesystems, they use this information at filesystem creation time
> to lay out their block groups, inode tables, etc. in such a fashion that
> they are aligned to individual chunks and also so that they are *not*
> exactly stripe width apart from each other (which forces the metadata to
> reside on different disks and avoids the possible pathological case
> where you could accidentally end up with the metadata blocks always
> falling on the same disk in the array making that one disk a huge
> bottleneck to the rest of the array).  Once an ext filesystem is
> created, I don't think it uses the data much any longer, but I could be
> wrong.  However, I know that it won't be rearranged for your new layout,
> so you get what you get after you grow the fs.
>

You can change both the stride & stripe width extended options on an
existing ext[234] filesystem w/ tune2fs (w/ XFS, it's done w/ mount options),
and as I understand the tune2fs man page the block allocator should use
the new values, although it seems stripe width is maybe used more.

--
Conway S. Smith
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html