The FAQ:
http://xfs.org/index.php/XFS_FAQ#Q:_How_to_calculate_the_correct_sunit.2Cswidth_values_for_optimal_performance
suggests using:
su = hardware RAID stripe size on single disk
sw = (disks in RAID-10 / 2)
on hardware RAID 10 volumes, but doesn't provide a reason for that "sw"
value (other than that "(disks in RAID-10 / 2)" are the effective data
disks).
In the RAID 5 / RAID 6 case, obviously you want (su * sw) to cover the
user data that you can write across the whole array in a single stripe,
since that is the "writeable unit" on the array on which
read/modify/write will need to be done -- so you do not want to have a
data structure spanning the boundary between two writeable units (as
that means two blocks will need to be read / modify / written).
In the RAID 10 case it is clearly preferable to avoid spanning across
the boundary of a _single_ disk's (pair's) stripe size (su * 1), as then
_two_ pairs of disks in the RAID 10 need to get involved in the write
(so you potentially have two seek penalties, etc).
But in the RAID 10 case the each physical disk is just paired with one
other disk, and that pair can be written independently of the rest --
since there's no parity information as such, there's normally no need
for a read / modify / write cycle of any block larger than, eg, a
physical sector or SSD erase block.
So why is "sw" in the RAID 10 case given as "(disks in RAID-10 / 2)"
rather than "1"? Wouldn't
su = hardware RAID stripe size on single disk
sw = 1
make more sense for RAID 10?
In the RAID 10 case, spanning across the whole data disk set seems
likely to align data structures (more frequently) on the first disk pair
in the RAID set (especially with larger single-disk stripe sizes),
potentially making that the "metadata disk pair" -- and thus both
potentially having more metadata activity on it, and also being more at
risk if one disk in that pair is lost or that pair is rebuilding. (The
same "align to start of disk set" would seem to happen with RAID 5 /
RAID 6 too, but is unavoidable due to the "large smallest physically
modifiable block" issue.)
What am I missing that leads to the FAQ suggesting "sw = (disks in
RAID-10 / 2)"? Perhaps this additional rationale could be added to that
FAQ question? (Or if "sw = 1" actually does make sense on RAID 10, the
FAQ could be updated to suggest that as an option.)
Thanks,
Ewen
PS: In the specific case that had me pondering this today, it's RAID 10
over 12 spindles, with a 512KB per-spindle stripe size. So that's
either 512KB * 1 = 512KB, or 512KB * 6 = 3072KB depending on the rationale.
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html