Re: RAID 5 rebuild fails with power interruption.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(adding linux-raid back in to the CC list - please don't drop Cc's)

On Mon, 23 Nov 2009 19:01:31 +0530 <senthilkumar.muthukalai@xxxxxxxxx>
wrote:

> Hi Neil,
> 
> I applied the patch to our code as seen below.
> But then the disk is kicked out of the array while the system is power
> interrupted.
> Should I use --force option always to ensure the disk is not thrown
> out in this case?
> Pls advice.

It looks like you need one extra change in that patch for it to
be completely reliable.  See below.

Note that if you interrupt power while the array is degraded (which is
the case while it is recovering to a spare), and the array was active
at that time (i.e. there had been a write in the last 200ms or so),
then you will have a "dirty degraded" array and mdadm will refuse to
assemble such an array unless you use --force.
This is because when an array is 'dirty' you cannot trust the parity
to be correct, and when it is degraded you might have some data missing,
and that data cannot reliably be recovered from the parity (because we
don't trust the parity).

Pulling the power on a RAID5 array simply is not a good idea.

NeilBrown


diff --git a/drivers/md/md.c b/drivers/md/md.c
index b2a9ebc..e68b254 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1517,12 +1517,10 @@ static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev)
 
 	if (rdev->raid_disk >= 0 &&
 	    !test_bit(In_sync, &rdev->flags)) {
-		if (rdev->recovery_offset > 0) {
-			sb->feature_map |=
-				cpu_to_le32(MD_FEATURE_RECOVERY_OFFSET);
-			sb->recovery_offset =
-				cpu_to_le64(rdev->recovery_offset);
-		}
+		sb->feature_map |=
+			cpu_to_le32(MD_FEATURE_RECOVERY_OFFSET);
+		sb->recovery_offset =
+			cpu_to_le64(rdev->recovery_offset);
 	}
 
 	if (mddev->reshape_position != MaxSector) {
@@ -1556,7 +1554,7 @@ static void super_1_sync(mddev_t *mddev, mdk_rdev_t *rdev)
 			sb->dev_roles[i] = cpu_to_le16(0xfffe);
 		else if (test_bit(In_sync, &rdev2->flags))
 			sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
-		else if (rdev2->raid_disk >= 0 && rdev2->recovery_offset > 0)
+		else if (rdev2->raid_disk >= 0)
 			sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
 		else
 			sb->dev_roles[i] = cpu_to_le16(0xffff);
@@ -6769,6 +6767,7 @@ static int remove_and_add_spares(mddev_t *mddev)
 						       nm, mdname(mddev));
 					spares++;
 					md_new_event(mddev);
+					set_bit(MD_CHANGE_DEVS, &mddev->flags);
 				} else
 					break;
 			}
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux