Re: 2 drives failed, one "active", one with wrong event count

Neil Brown <neilb@xxxxxxx> · Thu, 4 Feb 2010 12:03:12 +1100

On Mon, 1 Feb 2010 08:13:24 +0100 (CET)
Mikael Abrahamsson <swmike@xxxxxxxxx> wrote:

> On Mon, 1 Feb 2010, Neil Brown wrote:
> 
> > You might know that nothing has been written to the array since the 
> > device with the lower event count was removed, but md doesn't know that. 
> > Any device with an old event count could have old and so cannot be 
> > trusted (unless you assemble with --force meaning that you are taking 
> > responsibility).
> 
> I did use --force, but it seems in the state "one drive with lower event 
> count and another one with 0x2", the event count on the drive isn't 
> forcably updated and since there is a 0x2 drive, the array isn't started.
> 
> I had the same situation again this morning (changing controller next), 
> but this time I had bitmaps enabled so recovery of the array with 
> --assemble --force took just a few seconds. Really nice.
> 

Right... I understand now.

Fixed with the following patch which will be in 3.1.2.

Thanks,
NeilBrown

commit 921d9e164fd3f6203d1b0cf2424b793043afd001
Author: NeilBrown <neilb@xxxxxxx>
Date:   Thu Feb 4 12:02:09 2010 +1100

    Assemble: fix --force assembly of v1.x arrays which are recovering.
    
    1.x metadata allows a device to be a member of the array while it
    is still recoverying.  So it is a working member, but is not
    completely in-sync.
    
    mdadm/assemble does not understand this distinction and assumes that a
    work member is fully in-sync for the purpose of determining if there
    are enough in-sync devices for the array to be functional.
    
    So collect the 'recovery_start' value from the metadata and use it in
    assemble when determining how useful a given device is.
    
    Reported-by: Mikael Abrahamsson <swmike@xxxxxxxxx>
    Signed-off-by: NeilBrown <neilb@xxxxxxx>

diff --git a/Assemble.c b/Assemble.c
index 7f90048..e4d6181 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -800,7 +800,8 @@ int Assemble(struct supertype *st, char *mddev,
 		if (devices[j].i.events+event_margin >=
 		    devices[most_recent].i.events) {
 			devices[j].uptodate = 1;
-			if (i < content->array.raid_disks) {
+			if (i < content->array.raid_disks &&
+			    devices[j].i.recovery_start == MaxSector) {
 				okcnt++;
 				avail[i]=1;
 			} else
@@ -822,6 +823,7 @@ int Assemble(struct supertype *st, char *mddev,
 			int j = best[i];
 			if (j>=0 &&
 			    !devices[j].uptodate &&
+			    devices[j].i.recovery_start == MaxSector &&
 			    (chosen_drive < 0 ||
 			     devices[j].i.events
 			     > devices[chosen_drive].i.events))
diff --git a/super-ddf.c b/super-ddf.c
index 3e30229..870efd8 100644
--- a/super-ddf.c
+++ b/super-ddf.c
@@ -1369,6 +1369,7 @@ static void getinfo_super_ddf(struct supertype *st, struct mdinfo *info)
 	info->disk.state = (1 << MD_DISK_SYNC) | (1 << MD_DISK_ACTIVE);
 
 
+	info->recovery_start = MaxSector;
 	info->reshape_active = 0;
 	info->name[0] = 0;
 
@@ -1427,6 +1428,7 @@ static void getinfo_super_ddf_bvd(struct supertype *st, struct mdinfo *info)
 
 	info->container_member = ddf->currentconf->vcnum;
 
+	info->recovery_start = MaxSector;
 	info->resync_start = 0;
 	if (!(ddf->virt->entries[info->container_member].state
 	      & DDF_state_inconsistent)  &&
diff --git a/super-intel.c b/super-intel.c
index 91479a2..bbdcb51 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1452,6 +1452,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info)
 	info->data_offset	  = __le32_to_cpu(map->pba_of_lba0);
 	info->component_size	  = __le32_to_cpu(map->blocks_per_member);
 	memset(info->uuid, 0, sizeof(info->uuid));
+	info->recovery_start = MaxSector;
 
 	if (map->map_state == IMSM_T_STATE_UNINITIALIZED || dev->vol.dirty) {
 		info->resync_start = 0;
@@ -1559,6 +1560,7 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info)
 	info->disk.number = -1;
 	info->disk.state = 0;
 	info->name[0] = 0;
+	info->recovery_start = MaxSector;
 
 	if (super->disks) {
 		__u32 reserved = imsm_reserved_sectors(super, super->disks);
diff --git a/super0.c b/super0.c
index 0485a3a..5c6b7d7 100644
--- a/super0.c
+++ b/super0.c
@@ -372,6 +372,7 @@ static void getinfo_super0(struct supertype *st, struct mdinfo *info)
 
 	uuid_from_super0(st, info->uuid);
 
+	info->recovery_start = MaxSector;
 	if (sb->minor_version > 90 && (sb->reshape_position+1) != 0) {
 		info->reshape_active = 1;
 		info->reshape_progress = sb->reshape_position;
diff --git a/super1.c b/super1.c
index 85bb598..40fbb81 100644
--- a/super1.c
+++ b/super1.c
@@ -612,6 +612,11 @@ static void getinfo_super1(struct supertype *st, struct mdinfo *info)
 	strncpy(info->name, sb->set_name, 32);
 	info->name[32] = 0;
 
+	if (sb->feature_map & __le32_to_cpu(MD_FEATURE_RECOVERY_OFFSET))
+		info->recovery_start = __le32_to_cpu(sb->recovery_offset);
+	else
+		info->recovery_start = MaxSector;
+
 	if (sb->feature_map & __le32_to_cpu(MD_FEATURE_RESHAPE_ACTIVE)) {
 		info->reshape_active = 1;
 		info->reshape_progress = __le64_to_cpu(sb->reshape_position);
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html