Re: RAID5 losing initial synchronization on restart when one disk is spare

Hubert Verstraete <hubskml@xxxxxxx> · Thu, 12 Jun 2008 11:12:22 +0200

Neil Brown wrote:
On Wednesday June 4, hubskml@xxxxxxx wrote:
Hello

According to mdadm's man page:
"When creating a RAID5 array, mdadm will automatically create a degraded
array with an extra spare drive. This is because building the spare
into a degraded array is in general faster than resyncing the parity on
a non-degraded, but not clean, array. This feature can be over-ridden
with the --force option."

Unfortunately, I'm seeing a kind of bug when I create a RAID5 array with 
an internal bitmap, then stop the array before the initial 
synchronization is done and restart the array.

1° When I create the array with an internal bitmap:
mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -b internal -R /dev/sd?
I see the last disk as a spare disk. After the restart of the array, all 
disks are seen active and the array is not continuing the aborted 
synchronization!
Note that I did not use the --assume-clean option.

2° When I create the array without a bitmap:
mdadm -C /dev/md_d1 -e 1.2 -l 5 -n 4 -R /dev/sd?
I see the last disk as a spare disk. After the restart of the array, the 
spare disk is still a spare disk and the array continues the 
synchronization where it had stopped.

In the case 1°, is this a bug or did I miss something?

Thanks for the detailed report.  Yes, this is a bug.

The following patch fixes it, though I'm not 100% sure this is the
right fix (it may cause too much resync in some cases, which is better
than not enough, but not ideal).

NeilBrown

Signed-off-by: Neil Brown <neilb@xxxxxxx>

diff .prev/drivers/md/raid5.c ./drivers/md/raid5.c

--- .prev/drivers/md/raid5.c	2008-06-10 10:27:51.000000000 +1000
+++ ./drivers/md/raid5.c	2008-06-12 09:34:25.000000000 +1000
@@ -4094,7 +4094,9 @@ static int run(mddev_t *mddev)
 				" disk %d\n", bdevname(rdev->bdev,b),
 				raid_disk);
 			working_disks++;
-		}
+		} else
+			/* Cannot rely on bitmap to complete recovery */
+			conf->fullsync = 1;
 	}
 
 	/*

Thanks Neil, I can confirm this solves this issue.
Regarding the eventual unwanted resync, I can't say.

Regards,
Hubert Verstraete
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html