Re: RAID1 removing failed disk returns EBUSY

Xiao Ni <xni@xxxxxxxxxx> · Thu, 29 Jan 2015 07:14:16 -0500 (EST)

----- Original Message -----
> From: "NeilBrown" <neilb@xxxxxxx>
> To: "Xiao Ni" <xni@xxxxxxxxxx>
> Cc: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "Bill Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> Sent: Thursday, January 29, 2015 11:52:17 AM
> Subject: Re: RAID1 removing failed disk returns EBUSY
> 
> On Sun, 18 Jan 2015 21:33:50 -0500 (EST) Xiao Ni <xni@xxxxxxxxxx> wrote:
> 
> > 
> > 
> > ----- Original Message -----
> > > From: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>
> > > To: "Xiao Ni" <xni@xxxxxxxxxx>
> > > Cc: "NeilBrown" <neilb@xxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "Bill
> > > Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> > > Sent: Friday, January 16, 2015 11:10:31 PM
> > > Subject: Re: RAID1 removing failed disk returns EBUSY
> > > 
> > > On Fri, 16 Jan 2015 00:20:12 -0500
> > > Xiao Ni <xni@xxxxxxxxxx> wrote:
> > > > 
> > > > Hi Joe
> > > > 
> > > >    Thanks for reminding me. I didn't do that. Now it can remove
> > > >    successfully after writing
> > > > "idle" to sync_action.
> > > > 
> > > >    I thought wrongly that the patch referenced in this mail is fixed
> > > >    for
> > > >    the problem.
> > > 
> > > So it sounds like even with 3.18 and a new mdadm, this bug still
> > > persists?
> > > 
> > > -- Joe
> > > 
> > > --
> > 
> > Hi Joe
> > 
> >    I'm a little confused now. Does the patch
> >    45eaf45dfa4850df16bc2e8e7903d89021137f40 from linux-stable
> > resolve the problem?
> > 
> >    My environment is:
> > 
> > [root@dhcp-12-133 mdadm]# mdadm --version
> > mdadm - v3.3.2-18-g93d3bd3 - 18th December 2014  (this is the newest
> > upstream)
> > [root@dhcp-12-133 mdadm]# uname -r
> > 3.18.2
> > 
> > 
> >    My steps are:
> > 
> > [root@dhcp-12-133 mdadm]# lsblk
> > sdb                       8:16   0 931.5G  0 disk
> > └─sdb1                    8:17   0     5G  0 part
> > sdc                       8:32   0 186.3G  0 disk
> > sdd                       8:48   0 931.5G  0 disk
> > └─sdd1                    8:49   0     5G  0 part
> > [root@dhcp-12-133 mdadm]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1 /dev/sdd1
> > --assume-clean
> > mdadm: Note: this array has metadata at the start and
> >     may not be suitable as a boot device.  If you plan to
> >     store '/boot' on this device please ensure that
> >     your boot-loader understands md/v1.x metadata, or use
> >     --metadata=0.90
> > mdadm: Defaulting to version 1.2 metadata
> > mdadm: array /dev/md0 started.
> > 
> >    Then I unplug the disk.
> > 
> > [root@dhcp-12-133 mdadm]# lsblk
> > sdc                       8:32   0 186.3G  0 disk
> > sdd                       8:48   0 931.5G  0 disk
> > └─sdd1                    8:49   0     5G  0 part
> >   └─md0                   9:0    0     5G  0 raid1
> > [root@dhcp-12-133 mdadm]# echo faulty > /sys/block/md0/md/dev-sdb1/state
> > [root@dhcp-12-133 mdadm]# echo remove > /sys/block/md0/md/dev-sdb1/state
> > -bash: echo: write error: Device or resource busy
> > [root@dhcp-12-133 mdadm]# echo idle > /sys/block/md0/md/sync_action
> > [root@dhcp-12-133 mdadm]# echo remove > /sys/block/md0/md/dev-sdb1/state
> > 
> 
> I cannot reproduce this - using linux 3.18.2.  I'd be surprised if mdadm
> version affects things.

Hi Neil

   I'm very curious, because it can reproduce in my machine 100%.

> 
> This error (Device or resoource busy) implies that rdev->raid_disk is >= 0
> (tested in state_store()).
> 
> ->raid_disk is set to -1 by remove_and_add_spares() providing:
>   1/ it isn't Blocked (which is very unlikely)
>   2/ hot_remove_disk succeeds, which it will if nr_pending is zero, and
>   3/ nr_pending is zero.

   I remember I have tired to check those reasons. But it's really is the reason 1
which is very unlikely.

   I add some code in the function array_state_show

    array_state_show(struct mddev *mddev, char *page) {
        enum array_state st = inactive;
        struct md_rdev *rdev;

        rdev_for_each_rcu(rdev, mddev) {
                printk(KERN_ALERT "search for %s\n", rdev->bdev->bd_disk->disk_name);
                if (test_bit(Blocked, &rdev->flags))
                        printk(KERN_ALERT "rdev is Blocked\n");
                else
                        printk(KERN_ALERT "rdev is not Blocked\n");
    }

  When I echo 1 > /sys/block/sdc/device/delete, then I ran command:

[root@dhcp-12-133 md]# cat /sys/block/md0/md/array_state 
read-auto
[root@dhcp-12-133 md]# dmesg 
[ 2679.559185] search for sdc
[ 2679.559189] rdev is Blocked
[ 2679.559190] search for sdb
[ 2679.559190] rdev is not Blocked

  So sdc is Blocked

> 
> So it seems most likely that either:
>  1/ nr_pending is non-zero, or
>  2/ remove_and_add_spares() didn't run.
> 
> nr_pending can only get set if IO is generated, and your sequence of steps
> don't show any IO.  It is possible that something else (e.g. started by udev)
> triggered some IO.  How long that IO can stay pending might depend on exactly
> how you unplug the device.
> In my tests I used
>    echo 1 > /sys/block/sdXX/../../delete
> which may have a different effect to what you do.
> 
> However the fact that writing 'idle' to sync_action releases the device seems
> to suggest the nr_pending has dropped to zero.  So either
>   - remove_and_add_spares didn't run, or
>   - remove_and_add_spares ran during a small window when nr_pending was
>     elevated, and then didn't run again when nr_pending was reduced to zero.
> 
> Ahh.... that rings bells....
> 
> I have the following patch in the SLES kernel which I have applied to
> mainline yet (and given how old it is, that is really slack of me).
> 
> Can you apply the following and see if the symptom goes away please?

   I have tried the patch, the problem is still exist.
> 
> Thanks,
> NeilBrown
> 
> From: Hannes Reinecke <hare@xxxxxxx>
> Date: Thu, 26 Jul 2012 11:12:18 +0200
> Subject: [PATCH] md: wakeup thread upon rdev_dec_pending()
> 
> After each call to rdev_dec_pending() we should wakeup the
> md thread if the device is found to be faulty.
> Otherwise we'll incur heavy delays on failing devices.
> 
> Signed-off-by: Neil Brown <nfbrown@xxxxxxx>
> Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
> 
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index 03cec5bdcaae..4cc2f59b2994 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -439,13 +439,6 @@ struct mddev {
>  	void (*sync_super)(struct mddev *mddev, struct md_rdev *rdev);
>  };
>  
> -static inline void rdev_dec_pending(struct md_rdev *rdev, struct mddev
> *mddev)
> -{
> -	int faulty = test_bit(Faulty, &rdev->flags);
> -	if (atomic_dec_and_test(&rdev->nr_pending) && faulty)
> -		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> -}
> -
>  static inline void md_sync_acct(struct block_device *bdev, unsigned long
>  nr_sectors)
>  {
>  	atomic_add(nr_sectors, &bdev->bd_contains->bd_disk->sync_io);
> @@ -624,4 +617,14 @@ static inline int mddev_check_plugged(struct mddev
> *mddev)
>  	return !!blk_check_plugged(md_unplug, mddev,
>  				   sizeof(struct blk_plug_cb));
>  }
> +
> +static inline void rdev_dec_pending(struct md_rdev *rdev, struct mddev
> *mddev)
> +{
> +	int faulty = test_bit(Faulty, &rdev->flags);
> +	if (atomic_dec_and_test(&rdev->nr_pending) && faulty) {
> +		set_bit(MD_RECOVERY_NEEDED, &mddev->recovery);
> +		md_wakeup_thread(mddev->thread);
> +	}
> +}
> +
>  #endif /* _MD_MD_H */
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html