Re: RAID1 removing failed disk returns EBUSY

Xiao Ni <xni@xxxxxxxxxx> · Thu, 25 Jun 2015 05:42:54 -0400 (EDT)

----- Original Message -----
> From: "Neil Brown" <neilb@xxxxxxx>
> To: "XiaoNi" <xni@xxxxxxxxxx>
> Cc: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "Bill Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> Sent: Wednesday, June 17, 2015 10:51:51 AM
> Subject: Re: RAID1 removing failed disk returns EBUSY
> 
> On Wed, 10 Jun 2015 14:26:41 +0800
> XiaoNi <xni@xxxxxxxxxx> wrote:
> 
> > 
> > 
> > On 02/03/2015 04:10 PM, Xiao Ni wrote:
> > >
> > > ----- Original Message -----
> > >> From: "NeilBrown" <neilb@xxxxxxx>
> > >> To: "Xiao Ni" <xni@xxxxxxxxxx>
> > >> Cc: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>,
> > >> linux-raid@xxxxxxxxxxxxxxx, "Bill Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> > >> Sent: Monday, February 2, 2015 2:36:01 PM
> > >> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>
> > >> On Thu, 29 Jan 2015 07:14:16 -0500 (EST) Xiao Ni <xni@xxxxxxxxxx> wrote:
> > >>
> > >>>
> > >>> ----- Original Message -----
> > >>>> From: "NeilBrown" <neilb@xxxxxxx>
> > >>>> To: "Xiao Ni" <xni@xxxxxxxxxx>
> > >>>> Cc: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>,
> > >>>> linux-raid@xxxxxxxxxxxxxxx, "Bill Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> > >>>> Sent: Thursday, January 29, 2015 11:52:17 AM
> > >>>> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>>>
> > >>>> On Sun, 18 Jan 2015 21:33:50 -0500 (EST) Xiao Ni <xni@xxxxxxxxxx>
> > >>>> wrote:
> > >>>>
> > >>>>>
> > >>>>> ----- Original Message -----
> > >>>>>> From: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>
> > >>>>>> To: "Xiao Ni" <xni@xxxxxxxxxx>
> > >>>>>> Cc: "NeilBrown" <neilb@xxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "Bill
> > >>>>>> Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> > >>>>>> Sent: Friday, January 16, 2015 11:10:31 PM
> > >>>>>> Subject: Re: RAID1 removing failed disk returns EBUSY
> > >>>>>>
> > >>>>>> On Fri, 16 Jan 2015 00:20:12 -0500
> > >>>>>> Xiao Ni <xni@xxxxxxxxxx> wrote:
> > >>>>>>> Hi Joe
> > >>>>>>>
> > >>>>>>>     Thanks for reminding me. I didn't do that. Now it can remove
> > >>>>>>>     successfully after writing
> > >>>>>>> "idle" to sync_action.
> > >>>>>>>
> > >>>>>>>     I thought wrongly that the patch referenced in this mail is
> > >>>>>>>     fixed
> > >>>>>>>     for
> > >>>>>>>     the problem.
> > >>>>>> So it sounds like even with 3.18 and a new mdadm, this bug still
> > >>>>>> persists?
> > >>>>>>
> > >>>>>> -- Joe
> > >>>>>>
> > >>>>>> --
> > >>>>> Hi Joe
> > >>>>>
> > >>>>>     I'm a little confused now. Does the patch
> > >>>>>     45eaf45dfa4850df16bc2e8e7903d89021137f40 from linux-stable
> > >>>>> resolve the problem?
> > >>>>>
> > >>>>>     My environment is:
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# mdadm --version
> > >>>>> mdadm - v3.3.2-18-g93d3bd3 - 18th December 2014  (this is the newest
> > >>>>> upstream)
> > >>>>> [root@dhcp-12-133 mdadm]# uname -r
> > >>>>> 3.18.2
> > >>>>>
> > >>>>>
> > >>>>>     My steps are:
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# lsblk
> > >>>>> sdb                       8:16   0 931.5G  0 disk
> > >>>>> └─sdb1                    8:17   0     5G  0 part
> > >>>>> sdc                       8:32   0 186.3G  0 disk
> > >>>>> sdd                       8:48   0 931.5G  0 disk
> > >>>>> └─sdd1                    8:49   0     5G  0 part
> > >>>>> [root@dhcp-12-133 mdadm]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1
> > >>>>> /dev/sdd1
> > >>>>> --assume-clean
> > >>>>> mdadm: Note: this array has metadata at the start and
> > >>>>>      may not be suitable as a boot device.  If you plan to
> > >>>>>      store '/boot' on this device please ensure that
> > >>>>>      your boot-loader understands md/v1.x metadata, or use
> > >>>>>      --metadata=0.90
> > >>>>> mdadm: Defaulting to version 1.2 metadata
> > >>>>> mdadm: array /dev/md0 started.
> > >>>>>
> > >>>>>     Then I unplug the disk.
> > >>>>>
> > >>>>> [root@dhcp-12-133 mdadm]# lsblk
> > >>>>> sdc                       8:32   0 186.3G  0 disk
> > >>>>> sdd                       8:48   0 931.5G  0 disk
> > >>>>> └─sdd1                    8:49   0     5G  0 part
> > >>>>>    └─md0                   9:0    0     5G  0 raid1
> > >>>>> [root@dhcp-12-133 mdadm]# echo faulty >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>> [root@dhcp-12-133 mdadm]# echo remove >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>> -bash: echo: write error: Device or resource busy
> > >>>>> [root@dhcp-12-133 mdadm]# echo idle > /sys/block/md0/md/sync_action
> > >>>>> [root@dhcp-12-133 mdadm]# echo remove >
> > >>>>> /sys/block/md0/md/dev-sdb1/state
> > >>>>>
> > >>>> I cannot reproduce this - using linux 3.18.2.  I'd be surprised if
> > >>>> mdadm
> > >>>> version affects things.
> > >>> Hi Neil
> > >>>
> > >>>     I'm very curious, because it can reproduce in my machine 100%.
> > >>>
> > >>>> This error (Device or resoource busy) implies that rdev->raid_disk is
> > >>>> >=
> > >>>> 0
> > >>>> (tested in state_store()).
> > >>>>
> > >>>> ->raid_disk is set to -1 by remove_and_add_spares() providing:
> > >>>>    1/ it isn't Blocked (which is very unlikely)
> > >>>>    2/ hot_remove_disk succeeds, which it will if nr_pending is zero,
> > >>>>    and
> > >>>>    3/ nr_pending is zero.
> > >>>     I remember I have tired to check those reasons. But it's really is
> > >>>     the
> > >>>     reason 1
> > >>> which is very unlikely.
> > >>>
> > >>>     I add some code in the function array_state_show
> > >>>
> > >>>      array_state_show(struct mddev *mddev, char *page) {
> > >>>          enum array_state st = inactive;
> > >>>          struct md_rdev *rdev;
> > >>>
> > >>>          rdev_for_each_rcu(rdev, mddev) {
> > >>>                  printk(KERN_ALERT "search for %s\n",
> > >>>                  rdev->bdev->bd_disk->disk_name);
> > >>>                  if (test_bit(Blocked, &rdev->flags))
> > >>>                          printk(KERN_ALERT "rdev is Blocked\n");
> > >>>                  else
> > >>>                          printk(KERN_ALERT "rdev is not Blocked\n");
> > >>>      }
> > >>>
> > >>>    When I echo 1 > /sys/block/sdc/device/delete, then I ran command:
> > >>>
> > >>> [root@dhcp-12-133 md]# cat /sys/block/md0/md/array_state
> > >>> read-auto
> > >>    ^^^^^^^^^
> > >>
> > >> I think that is half the explanation.
> > >> You must have the md_mod.start_ro parameter set to '1'.
> > >>
> > >>
> > >>> [root@dhcp-12-133 md]# dmesg
> > >>> [ 2679.559185] search for sdc
> > >>> [ 2679.559189] rdev is Blocked
> > >>> [ 2679.559190] search for sdb
> > >>> [ 2679.559190] rdev is not Blocked
> > >>>     
> > >>>    So sdc is Blocked
> > >> and that is the other half - thanks.
> > >> (yes, I was wrong.  Sometimes it is easier than being right, but still
> > >> yields results).
> > >>
> > >> When a device fails, it is Blocked until the metadata is updated to
> > >> record
> > >> the failure.  This ensures that no writes succeed without writing to
> > >> that
> > >> device, until we a certain that no read will try reading from that
> > >> device,
> > >> even after a crash/restart.
> > >>
> > >> Blocked is cleared after the metadata is written, but read-auto (and
> > >> read-only) devices never write out their metadata.  So blocked doesn't
> > >> get
> > >> cleared.
> > >>
> > >> When you "echo idle > .../sync_action" one of the side effects is to
> > >> with
> > >> from 'read-auto' to fully active.  This allows the metadata to be
> > >> written,
> > >> Blocked to be cleared, and the device to be removed.
> > >>
> > >> If you
> > >>    echo none > /sys/block/md0/md/dev-sdc/slot
> > >>
> > >> first, then the remove will work.
> > >>
> > >> We could possibly fix it with something like the following, but I'm not
> > >> sure
> > >> I like it.  There is no guarantee that I can see which would ensure the
> > >> superblock got updated before the first write if the array switch to
> > >> read/write.
> > >>
> > >> NeilBrown
> > >>
> > >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> > >> index 9233c71138f1..b3d1e8e5e067 100644
> > >> --- a/drivers/md/md.c
> > >> +++ b/drivers/md/md.c
> > >> @@ -7528,7 +7528,7 @@ static int remove_and_add_spares(struct mddev
> > >> *mddev,
> > >>   	rdev_for_each(rdev, mddev)
> > >>   		if ((this == NULL || rdev == this) &&
> > >>   		    rdev->raid_disk >= 0 &&
> > >> -		    !test_bit(Blocked, &rdev->flags) &&
> > >> +		    (!test_bit(Blocked, &rdev->flags) || mddev->ro) &&
> > >>   		    (test_bit(Faulty, &rdev->flags) ||
> > >>   		     ! test_bit(In_sync, &rdev->flags)) &&
> > >>   		    atomic_read(&rdev->nr_pending)==0) {
> > >>
> > >>
> > >>
> > > Hi Neil
> > >
> > >     I have tried the patch and the problem can be fixed by it. But I'm
> > >     sorry that I can't
> > > give more advices for better idea about this. I'm not familiar with the
> > > metadata part about
> > > the md. I'll try to get more time to read the code about md.
> > >
> > Hi Neil
> > 
> >      I don't see the patch in linux-stable, do you miss this?
> 
> I don't believe this bug is sufficiently serious for the patch to go to
> -stable.  However it doesn't need to be fixed - thanks for the reminder.
> 
> I've just queued the following patch which I am happy with.  If you
> could confirm that it works for you, I would appreciate that.
> 
> Thanks,
> NeilBrown
> 
> 
> From: Neil Brown <neilb@xxxxxxx>
> Date: Wed, 17 Jun 2015 12:31:46 +1000
> Subject: [PATCH] md: clear Blocked flag on failed devices when array is
>  read-only.
> 
> The Blocked flag indicates that a device has failed but that this
> fact hasn't been recorded in the metadata yet.  Writes to such
> devices cannot be allowed until the metadata has been updated.
> 
> On a read-only array, the Blocked flag will never be cleared.
> This prevents the device being removed from the array.
> 
> If the metadata is being handled by the kernel
> (i.e. !mddev->external), then we can be sure that if the array is
> switch to writable, then a metadata update will happen and will
> record the failure.  So we don't need the flag set.
> 
> If metadata is externally managed, it is upto the external manager
> to clear the 'blocked' flag.
> 
> Reported-by: XiaoNi <xni@xxxxxxxxxx>
> Signed-off-by: NeilBrown <neilb@xxxxxxx>
> 
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 3d339e2..5a6681a 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8125,6 +8125,15 @@ void md_check_recovery(struct mddev *mddev)
>  		int spares = 0;
>  
>  		if (mddev->ro) {
> +			struct md_rdev *rdev;
> +			if (!mddev->external && mddev->in_sync)
> +				/* 'Blocked' flag not needed as failed devices
> +				 * will be recorded if array switched to read/write.
> +				 * Leaving it set will prevent the device
> +				 * from being removed.
> +				 */
> +				rdev_for_each(rdev, mddev)
> +					clear_bit(Blocked, &rdev->flags);
>  			/* On a read-only array we can:
>  			 * - remove failed devices
>  			 * - add already-in_sync devices if the array itself
> 
> 
Hi Neil

Sorry for late response for this. 

I have tried the patch. When I unplug the disk(sdc1) which belongs to the raid1, the directory 
/sys/block/md0/md/dev-sdc1 is deleted. I haven't read the code for unplug device. So is it what
you want?

Best Regards
Xiao
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html