Re: Problem with disk replacement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Neil,
I found another problem that the raid5 thread may run into an endless loop in a rare circumstance.

I have tested with or without the patch of "md/raid5: fix interaction of 'replace' and 'recovery'",
the problem will not happen without the patch. So it is the patch that introduce this problem.

Here is the fast steps to reproduce the problem:
1. create 3-drives RAID5 with --assume-clean (I use 1GB disks in VirtualBox machine)
   mdadm -C /dev/md1 -l 5 -n 3 --assume-clean /dev/sd[b-d]

2. change speed_limit_min and speed_limit_max to small number:
   echo 1 > /proc/sys/dev/raid/speed_limit_min
   echo 10 > /proc/sys/dev/raid/speed_limit_max

3. add three spare disks to the raid:
   mdadm /dev/md1 -a /dev/sde -a /dev/sdf -a /dev/sdg

4. replace three raid disks with this procedure:
   	for disk in sdb sdc sdd
	do
		mdadm /dev/md1 --replace /dev/$disk
		echo "idle" >  /sys/block/md1/md/sync_action
		sleep 3
	done

After step 4 the md1_raid5 process then ran into endless loop with 99% CPU usage.
The problem will not happen if there is no step 2. Also it will not happen if
only two disks were replaced.

Regards,
Qin Dehua

At 2013-07-23 11:21:24,NeilBrown <neilb@xxxxxxx> wrote:
>On Mon, 22 Jul 2013 18:23:57 +0800 (CST) qindehua <qindehua@xxxxxxx> wrote:
>
>> Hi Neil,
>> I have tested with this patch, and it fixes the problem.
>> I have also tried various combinations of failing and replacing disks,
>> with echo idle to /sys/block/md1/md/sync_action, they all work fine.
>> 
>> Regards,
>> Qin Dehua
>
>Thanks for confirming, and for all of your testing!
>
>I'll forward the patch to Linus shortly.
>
>NeilBrown
>
>
>
>> 
>> At 2013-07-22 11:04:15,NeilBrown <neilb@xxxxxxx> wrote:
>> >Hi Qin,
>> > thanks for the report.  I can easily reproduce the bug.
>> >
>> >I think this will fix it.  Could you please test and confirm that it fixes
>> >the problem for you too.
>> >
>> >Thanks,
>> >NeilBrown
>> >
>> >
>> >From: NeilBrown <neilb@xxxxxxx>
>> >Date: Mon, 22 Jul 2013 12:57:21 +1000
>> >Subject: [PATCH] md/raid5: fix interaction of 'replace' and 'recovery'.
>> >
>> >If a device in a RAID4/5/6 is being replaced while another is being
>> >recovered, then the writes to the replacement device currently don't
>> >happen, resulting in corruption when the replacement completes and the
>> >new drive takes over.
>> >
>> >This is because the replacement writes are only triggered when
>> >'s.replacing' is set and not when the similar 's.sync' is set (which
>> >is the case during resync and recovery - it means all devices need to
>> >be read).
>> >
>> >So schedule those writes when s.replacing is set as well.
>> >
>> >In this case we cannot use "STRIPE_INSYNC" to record that the
>> >replacement has happened as that is needed for recording that any
>> >parity calculation is complete.  So introduce STRIPE_REPLACED to
>> >record if the replacement has happened.
>> >
>> >This bug was introduced in commit 9a3e1101b827a59ac9036a672f5fa8d5279d0fe2
>> >(md/raid5:  detect and handle replacements during recovery.)
>> >which introduced replacement for raid5.
>> >That was in 3.3-rc3, so any stable kernel since then would benefit
>> >from this fix.
>> >
>> >Cc: stable@xxxxxxxxxxxxxxx (3.3+)
>> >Reported-by: qindehua <13691222965@xxxxxxx>
>> >Signed-off-by: NeilBrown <neilb@xxxxxxx>
>> >
>> >diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> >index 2bf094a..f0aa7abd 100644
>> >--- a/drivers/md/raid5.c
>> >+++ b/drivers/md/raid5.c
>> >@@ -3607,8 +3607,8 @@ static void handle_stripe(struct stripe_head *sh)
>> > 			handle_parity_checks5(conf, sh, &s, disks);
>> > 	}
>> > 
>> >-	if (s.replacing && s.locked == 0
>> >-	    && !test_bit(STRIPE_INSYNC, &sh->state)) {
>> >+	if ((s.replacing || s.syncing) && s.locked == 0
>> >+	    && !test_bit(STRIPE_REPLACED, &sh->state)) {
>> > 		/* Write out to replacement devices where possible */
>> > 		for (i = 0; i < conf->raid_disks; i++)
>> > 			if (test_bit(R5_UPTODATE, &sh->dev[i].flags) &&
>> >@@ -3617,7 +3617,9 @@ static void handle_stripe(struct stripe_head *sh)
>> > 				set_bit(R5_LOCKED, &sh->dev[i].flags);
>> > 				s.locked++;
>> > 			}
>> >-		set_bit(STRIPE_INSYNC, &sh->state);
>> >+		if (s.replacing)
>> >+			set_bit(STRIPE_INSYNC, &sh->state);
>> >+		set_bit(STRIPE_REPLACED, &sh->state);
>> > 	}
>> > 	if ((s.syncing || s.replacing) && s.locked == 0 &&
>> > 	    test_bit(STRIPE_INSYNC, &sh->state)) {
>> >diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
>> >index b0b663b..70c4932 100644
>> >--- a/drivers/md/raid5.h
>> >+++ b/drivers/md/raid5.h
>> >@@ -306,6 +306,7 @@ enum {
>> > 	STRIPE_SYNC_REQUESTED,
>> > 	STRIPE_SYNCING,
>> > 	STRIPE_INSYNC,
>> >+	STRIPE_REPLACED,
>> > 	STRIPE_PREREAD_ACTIVE,
>> > 	STRIPE_DELAYED,
>> > 	STRIPE_DEGRADED,
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux