Re: [PATCH] md: Add ability for disable bad block management

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 7 Dec 2011 11:10:06 +0000 "Kwolek, Adam" <adam.kwolek@xxxxxxxxx>
wrote:

> 
> 
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@xxxxxxx]

> > I cannot reproduce this.
> > I didn't physically remove devices, but I used
> >    echo 1 > /sys/block/sdc/device/delete
> > which should be nearly identical from the perspective of md and mdadm.
> 
> I've checked that when I'm deleting device using sysfs  everything works perfect. 
> When when device is pulled out, reshape stops in md/mdstat.
> 
> > If you could give me the exact set of steps that you follow to produce the
> > problem that would help - maybe a script?  Just a description is OK.
> 
> 
> #used disks sdb, sdc, sdd, sde
> export IMSM_NO_PLATFORM=1
> #create container
> mdadm -C /dev/md/imsm0 -amd -e imsm -n 3 /dev/sdb /dev/sdc /dev/sde -R
> #create vol
> mdadm -C /dev/md/raid5vol_0 -amd -l 5 --chunk 32 --size 104850 -n 3 /dev/sdb /dev/sdc /dev/sde -R
> #add spare
> mdadm --add /dev/md/imsm0 /dev/sdd
> #run OLCE
> mdadm --grow /dev/md/imsm0 --raid-devices 4
> #when reshape starts, I'm (physically) pulling device out
> 
> > Also you say it is blocking in md_do_sync.  Is that at the
> > 
> > 	wait_event(mddev->recovery_wait, !atomic_read(&mddev-
> > >recovery_active));
> > 
> > call just after the "out:" label?
> 
> None of those 2 places.
> It enters sync_request() function. Md_error() is called. 
> More is visible on thread stack information below (md_wait_for_blocked_rdev()).
> 
> 
> > 
> > What is the raid thread doing at this point?
> >    cat /proc/PID/stack
> > might help.
> 
> [md126_raid5]
> [<ffffffff8121d843>] md_wait_for_blocked_rdev+0xbc/0x10f
> [<ffffffffa01d87ce>] handle_stripe+0x1c5c/0x2c99 [raid456]
> [<ffffffffa01d9d0d>] raid5d+0x502/0x564 [raid456]
> [<ffffffff8121eca5>] md_thread+0x101/0x11f
> [<ffffffff81049e0e>] kthread+0x81/0x89
> [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [md126_reshape]
> [<ffffffffa02455a2>] sync_request+0x90a/0xbfb [raid456]
> [<ffffffff8121e151>] md_do_sync+0x7aa/0xc40
> [<ffffffff8121ecb3>] md_thread+0x101/0x11f
> [<ffffffff81049e0e>] kthread+0x81/0x89
> [<ffffffff812cc4f4>] kernel_thread_helper+0x4/0x10
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> > 
> > What are the contents of all the sysfs files?
> >    grep . /sys/block/mdXXX/md/*
> array_state		->active
> degraded		->1
> max_read_errors	->20
> reshape_position	->12288
> resync_start		->none
> sync_completed	->4096 / 209664
> 
> 
> >    grep . /sys/block/mdXXX/md/dev-*/*
> 
> When removed is sdd   /sys/block/mdXXX/md/dev-sdd/*
> bad_blocks		->4096 512
> 			->4608 128
> 			->4736 384
> block			->MISSING link is not valid
> errors			->0
> offset			->0
> recovery_start		->4096
> size			->104832
> slot			->3
> state			->faulty,write_error
> unacknowledged_bad_blocks	->4096 512
> 				->4608 128
> 				->4736 384
> 
> I hope this helps.

Yes it does, thanks.

Can you try with this patch as well please.

Thanks,
NeilBrown


diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ea6dce9..6cf0f6a 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3175,6 +3175,8 @@ static void analyse_stripe(struct stripe_head *sh, struct stripe_head_state *s)
 			rdev = rcu_dereference(conf->disks[i].rdev);
 			clear_bit(R5_ReadRepl, &dev->flags);
 		}
+		if (rdev && test_bit(Faulty, &rdev->flags))
+			rdev = NULL;
 		if (rdev) {
 			is_bad = is_badblock(rdev, sh->sector, STRIPE_SECTORS,
 					     &first_bad, &bad_sectors);

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux