On Wed, 24 Jul 2013 20:48:27 +0800 (CST) qindehua <qindehua@xxxxxxx> wrote: > Hi Neil, > I found another problem that the raid5 thread may run into an endless loop in a rare circumstance. > > I have tested with or without the patch of "md/raid5: fix interaction of 'replace' and 'recovery'", > the problem will not happen without the patch. So it is the patch that introduce this problem. > > Here is the fast steps to reproduce the problem: > 1. create 3-drives RAID5 with --assume-clean (I use 1GB disks in VirtualBox machine) > mdadm -C /dev/md1 -l 5 -n 3 --assume-clean /dev/sd[b-d] > > 2. change speed_limit_min and speed_limit_max to small number: > echo 1 > /proc/sys/dev/raid/speed_limit_min > echo 10 > /proc/sys/dev/raid/speed_limit_max > > 3. add three spare disks to the raid: > mdadm /dev/md1 -a /dev/sde -a /dev/sdf -a /dev/sdg > > 4. replace three raid disks with this procedure: > for disk in sdb sdc sdd > do > mdadm /dev/md1 --replace /dev/$disk > echo "idle" > /sys/block/md1/md/sync_action > sleep 3 > done > > After step 4 the md1_raid5 process then ran into endless loop with 99% CPU usage. > The problem will not happen if there is no step 2. Also it will not happen if > only two disks were replaced. Thanks a lot for the testing! This was missing from the previous patch: diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 1c3b279..e6e24c3 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3524,6 +3524,7 @@ static void handle_stripe(struct stripe_head *sh) test_and_clear_bit(STRIPE_SYNC_REQUESTED, &sh->state)) { set_bit(STRIPE_SYNCING, &sh->state); clear_bit(STRIPE_INSYNC, &sh->state); + clear_bit(STRIPE_REPLACED, &sh->state); } spin_unlock(&sh->stripe_lock); } and that omission caused the problem you saw. However after looking at the code more closely I'm not sure that is the only problem. So I won't push out a revised patch until I am sure. Maybe I don't actually need the new "STRIPE_REPLACED" flag after all. Thanks, NeilBrown > > Regards, > Qin Dehua > > At 2013-07-23 11:21:24,NeilBrown <neilb@xxxxxxx> wrote: > >On Mon, 22 Jul 2013 18:23:57 +0800 (CST) qindehua <qindehua@xxxxxxx> wrote: > > > >> Hi Neil, > >> I have tested with this patch, and it fixes the problem. > >> I have also tried various combinations of failing and replacing disks, > >> with echo idle to /sys/block/md1/md/sync_action, they all work fine. > >> > >> Regards, > >> Qin Dehua > > > >Thanks for confirming, and for all of your testing! > > > >I'll forward the patch to Linus shortly. > > > >NeilBrown > > > > > > > >> > >> At 2013-07-22 11:04:15,NeilBrown <neilb@xxxxxxx> wrote: > >> >Hi Qin, > >> > thanks for the report. I can easily reproduce the bug. > >> > > >> >I think this will fix it. Could you please test and confirm that it fixes > >> >the problem for you too. > >> > > >> >Thanks, > >> >NeilBrown > >> > > >> > > >> >From: NeilBrown <neilb@xxxxxxx> > >> >Date: Mon, 22 Jul 2013 12:57:21 +1000 > >> >Subject: [PATCH] md/raid5: fix interaction of 'replace' and 'recovery'. > >> > > >> >If a device in a RAID4/5/6 is being replaced while another is being > >> >recovered, then the writes to the replacement device currently don't > >> >happen, resulting in corruption when the replacement completes and the > >> >new drive takes over. > >> > > >> >This is because the replacement writes are only triggered when > >> >'s.replacing' is set and not when the similar 's.sync' is set (which > >> >is the case during resync and recovery - it means all devices need to > >> >be read). > >> > > >> >So schedule those writes when s.replacing is set as well. > >> > > >> >In this case we cannot use "STRIPE_INSYNC" to record that the > >> >replacement has happened as that is needed for recording that any > >> >parity calculation is complete. So introduce STRIPE_REPLACED to > >> >record if the replacement has happened. > >> > > >> >This bug was introduced in commit 9a3e1101b827a59ac9036a672f5fa8d5279d0fe2 > >> >(md/raid5: detect and handle replacements during recovery.) > >> >which introduced replacement for raid5. > >> >That was in 3.3-rc3, so any stable kernel since then would benefit > >> >from this fix. > >> > > >> >Cc: stable@xxxxxxxxxxxxxxx (3.3+) > >> >Reported-by: qindehua <13691222965@xxxxxxx> > >> >Signed-off-by: NeilBrown <neilb@xxxxxxx> > >> > > >> >diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c > >> >index 2bf094a..f0aa7abd 100644 > >> >--- a/drivers/md/raid5.c > >> >+++ b/drivers/md/raid5.c > >> >@@ -3607,8 +3607,8 @@ static void handle_stripe(struct stripe_head *sh) > >> > handle_parity_checks5(conf, sh, &s, disks); > >> > } > >> > > >> >- if (s.replacing && s.locked == 0 > >> >- && !test_bit(STRIPE_INSYNC, &sh->state)) { > >> >+ if ((s.replacing || s.syncing) && s.locked == 0 > >> >+ && !test_bit(STRIPE_REPLACED, &sh->state)) { > >> > /* Write out to replacement devices where possible */ > >> > for (i = 0; i < conf->raid_disks; i++) > >> > if (test_bit(R5_UPTODATE, &sh->dev[i].flags) && > >> >@@ -3617,7 +3617,9 @@ static void handle_stripe(struct stripe_head *sh) > >> > set_bit(R5_LOCKED, &sh->dev[i].flags); > >> > s.locked++; > >> > } > >> >- set_bit(STRIPE_INSYNC, &sh->state); > >> >+ if (s.replacing) > >> >+ set_bit(STRIPE_INSYNC, &sh->state); > >> >+ set_bit(STRIPE_REPLACED, &sh->state); > >> > } > >> > if ((s.syncing || s.replacing) && s.locked == 0 && > >> > test_bit(STRIPE_INSYNC, &sh->state)) { > >> >diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h > >> >index b0b663b..70c4932 100644 > >> >--- a/drivers/md/raid5.h > >> >+++ b/drivers/md/raid5.h > >> >@@ -306,6 +306,7 @@ enum { > >> > STRIPE_SYNC_REQUESTED, > >> > STRIPE_SYNCING, > >> > STRIPE_INSYNC, > >> >+ STRIPE_REPLACED, > >> > STRIPE_PREREAD_ACTIVE, > >> > STRIPE_DELAYED, > >> > STRIPE_DEGRADED, > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: PGP signature