Re: [PATCH] DM-RAID: Fix RAID10's check for sufficient redundancy

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2013-01-09 at 20:00 -0600, Jonathan Brassow wrote:
> DM RAID:  Fix RAID10's check for sufficient redundancy
> 
> Before attempting to activate a RAID array, it is checked for sufficient
> redundancy.  That is, we make sure that there are not too many failed
> devices - or devices specified for rebuild - to undermine our ability to
> activate the array.  The current code performs this check twice - once to
> ensure there were not too many devices specified for rebuild by the user
> ('validate_rebuild_devices') and again after possibly experiencing a failure
> to read the superblock ('analyse_superblocks').  Neither of these checks are
> sufficient.  The first check is done properly but with insufficient
> information about the possible failure state of the devices to make a good
> determination if the array can be activated.  The second check is simply
> done wrong in the case of RAID10 because it doesn't account for the
> independence of the stripes (i.e. mirror sets).  The solution is to use the
> properly written check ('validate_rebuild_devices'), but perform the check
> after the superblocks have been read and we know which devices have failed.
> This gives us one check instead of two and performs it in a location where
> it can be done right.
> 
> Only RAID10 was affected and it was affected in the following ways:
> - the code did not properly catch the condition where a user specified
>   a device for rebuild that already had a failed device in the same mirror
>   set.  (This condition would, however, be caught at a deeper level in MD.)
> - the code triggers a false positive and denies activation when devices in
>   independent mirror sets have failed - counting the failures as though they
>   were all in the same set.
> 
> The most likely place this error was introduced (or this patch should have
> been included) is in commit 4ec1e369 - first introduced in v3.7-rc1.
> 
> Signed-off-by: Jonathan Brassow <jbrassow@xxxxxxxxxx>

Neil,

This patch should apply cleanly on top of recent changes, but will not
apply cleanly on an older kernel (like 3.7) due to version # changes and
introduction of the new 'far' and 'offset' RAID10 algorithms.  If you
think this fix should be pushed back into 3.7 rather than just applying
on the latest code, I will make a patch for 3.7 - although I'm not
certain how I'd handle the version number conflict.

Thanks,
 brassow


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux