Rationale for hardware RAID 10 su, sw values in FAQ

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The FAQ:

http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance

suggests using:

su = hardware RAID stripe size on single disk

sw = (disks in RAID-10 / 2)

on hardware RAID 10 volumes, but doesn't provide a reason for that "sw" value (other than that "(disks in RAID-10 / 2)" are the effective data disks).

In the RAID 5 / RAID 6 case, obviously you want (su * sw) to cover the user data that you can write across the whole array in a single stripe, since that is the "writeable unit" on the array on which read/modify/write will need to be done -- so you do not want to have a data structure spanning the boundary between two writeable units (as that means two blocks will need to be read / modify / written).

In the RAID 10 case it is clearly preferable to avoid spanning across the boundary of a _single_ disk's (pair's) stripe size (su * 1), as then _two_ pairs of disks in the RAID 10 need to get involved in the write (so you potentially have two seek penalties, etc).

But in the RAID 10 case the each physical disk is just paired with one other disk, and that pair can be written independently of the rest -- since there's no parity information as such, there's normally no need for a read / modify / write cycle of any block larger than, eg, a physical sector or SSD erase block.

So why is "sw" in the RAID 10 case given as "(disks in RAID-10 / 2)" rather than "1"? Wouldn't

su = hardware RAID stripe size on single disk

sw = 1

make more sense for RAID 10?

In the RAID 10 case, spanning across the whole data disk set seems likely to align data structures (more frequently) on the first disk pair in the RAID set (especially with larger single-disk stripe sizes), potentially making that the "metadata disk pair" -- and thus both potentially having more metadata activity on it, and also being more at risk if one disk in that pair is lost or that pair is rebuilding. (The same "align to start of disk set" would seem to happen with RAID 5 / RAID 6 too, but is unavoidable due to the "large smallest physically modifiable block" issue.)

What am I missing that leads to the FAQ suggesting "sw = (disks in RAID-10 / 2)"? Perhaps this additional rationale could be added to that FAQ question? (Or if "sw = 1" actually does make sense on RAID 10, the FAQ could be updated to suggest that as an option.)

Thanks,

Ewen

PS: In the specific case that had me pondering this today, it's RAID 10 over 12 spindles, with a 512KB per-spindle stripe size. So that's either 512KB * 1 = 512KB, or 512KB * 6 = 3072KB depending on the rationale.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux