On Wed, 7 Dec 2011 11:10:06 +0000 "Kwolek, Adam" <adam.kwolek@xxxxxxxxx> wrote: > > > > -----Original Message----- > > From: NeilBrown [mailto:neilb@xxxxxxx] > > I cannot reproduce this. > > I didn't physically remove devices, but I used > > echo 1 > /sys/block/sdc/device/delete > > which should be nearly identical from the perspective of md and mdadm. > > I've checked that when I'm deleting device using sysfs everything works perfect. > When when device is pulled out, reshape stops in md/mdstat. > > > If you could give me the exact set of steps that you follow to produce the > > problem that would help - maybe a script? Just a description is OK. > > > #used disks sdb, sdc, sdd, sde > export IMSM_NO_PLATFORM=1 > #create container > mdadm -C /dev/md/imsm0 -amd -e imsm -n 3 /dev/sdb /dev/sdc /dev/sde -R > #create vol > mdadm -C /dev/md/raid5vol_0 -amd -l 5 --chunk 32 --size 104850 -n 3 /dev/sdb /dev/sdc /dev/sde -R > #add spare > mdadm --add /dev/md/imsm0 /dev/sdd > #run OLCE > mdadm --grow /dev/md/imsm0 --raid-devices 4 > #when reshape starts, I'm (physically) pulling device out > > > Also you say it is blocking in md_do_sync. Is that at the > > > > wait_event(mddev->recovery_wait, !atomic_read(&mddev- > > >recovery_active)); > > > > call just after the "out:" label? > > None of those 2 places. > It enters sync_request() function. Md_error() is called. > More is visible on thread stack information below (md_wait_for_blocked_rdev()). > > > > > > What is the raid thread doing at this point? > > cat /proc/PID/stack > > might help. > > [md126_raid5] > [<ffffffff8121d843>] md_wait_for_blocked_rdev+0xbc/0x10f > [<ffffffffa01d87ce>] handle_stripe+0x1c5c/0x2c99 [raid456] > [<ffffffffa01d9d0d>] raid5d+0x502/0x564 [raid456] > [<ffffffff8121eca5>] md_thread+0x101/0x11f > [<ffffffff81049e0e>] kthread+0x81/0x89 > [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10 > [<ffffffffffffffff>] 0xffffffffffffffff > > [md126_reshape] > [<ffffffffa02455a2>] sync_request+0x90a/0xbfb [raid456] > [<ffffffff8121e151>] md_do_sync+0x7aa/0xc40 > [<ffffffff8121ecb3>] md_thread+0x101/0x11f > [<ffffffff81049e0e>] kthread+0x81/0x89 > [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10 > [<ffffffffffffffff>] 0xffffffffffffffff > > > > > What are the contents of all the sysfs files? > > grep . /sys/block/mdXXX/md/* > array_state ->active > degraded ->1 > max_read_errors ->20 > reshape_position ->12288 > resync_start ->none > sync_completed ->4096 / 209664 > > > > grep . /sys/block/mdXXX/md/dev-*/* > > When removed is sdd /sys/block/mdXXX/md/dev-sdd/* > bad_blocks ->4096 512 > ->4608 128 > ->4736 384 > block ->MISSING link is not valid > errors ->0 > offset ->0 > recovery_start ->4096 > size ->104832 > slot ->3 > state ->faulty,write_error > unacknowledged_bad_blocks ->4096 512 > ->4608 128 > ->4736 384 > > I hope this helps. Yes it does, thanks. Can you try with this patch as well please. Thanks, NeilBrown diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index ea6dce9..6cf0f6a 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -3175,6 +3175,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s) rdev = rcu_dereference(conf->disks[i].rdev); clear_bit(R5_ReadRepl, &dev->flags); } + if (rdev && test_bit(Faulty, &rdev->flags)) + rdev = NULL; if (rdev) { is_bad = is_badblock(rdev, sh->sector, STRIPE_SECTORS, &first_bad, &bad_sectors);
Attachment:
signature.asc
Description: PGP signature