Re: RAID1 removing failed disk returns EBUSY

Neil Brown <neilb@xxxxxxx> · Wed, 17 Jun 2015 12:51:51 +1000

On Wed, 10 Jun 2015 14:26:41 +0800
XiaoNi <xni@xxxxxxxxxx> wrote:

> 
> 
> On 02/03/2015 04:10 PM, Xiao Ni wrote:
> >
> > ----- Original Message -----
> >> From: "NeilBrown" <neilb@xxxxxxx>
> >> To: "Xiao Ni" <xni@xxxxxxxxxx>
> >> Cc: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "Bill Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> >> Sent: Monday, February 2, 2015 2:36:01 PM
> >> Subject: Re: RAID1 removing failed disk returns EBUSY
> >>
> >> On Thu, 29 Jan 2015 07:14:16 -0500 (EST) Xiao Ni <xni@xxxxxxxxxx> wrote:
> >>
> >>>
> >>> ----- Original Message -----
> >>>> From: "NeilBrown" <neilb@xxxxxxx>
> >>>> To: "Xiao Ni" <xni@xxxxxxxxxx>
> >>>> Cc: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>,
> >>>> linux-raid@xxxxxxxxxxxxxxx, "Bill Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> >>>> Sent: Thursday, January 29, 2015 11:52:17 AM
> >>>> Subject: Re: RAID1 removing failed disk returns EBUSY
> >>>>
> >>>> On Sun, 18 Jan 2015 21:33:50 -0500 (EST) Xiao Ni <xni@xxxxxxxxxx> wrote:
> >>>>
> >>>>>
> >>>>> ----- Original Message -----
> >>>>>> From: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>
> >>>>>> To: "Xiao Ni" <xni@xxxxxxxxxx>
> >>>>>> Cc: "NeilBrown" <neilb@xxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "Bill
> >>>>>> Kuzeja" <william.kuzeja@xxxxxxxxxxx>
> >>>>>> Sent: Friday, January 16, 2015 11:10:31 PM
> >>>>>> Subject: Re: RAID1 removing failed disk returns EBUSY
> >>>>>>
> >>>>>> On Fri, 16 Jan 2015 00:20:12 -0500
> >>>>>> Xiao Ni <xni@xxxxxxxxxx> wrote:
> >>>>>>> Hi Joe
> >>>>>>>
> >>>>>>>     Thanks for reminding me. I didn't do that. Now it can remove
> >>>>>>>     successfully after writing
> >>>>>>> "idle" to sync_action.
> >>>>>>>
> >>>>>>>     I thought wrongly that the patch referenced in this mail is
> >>>>>>>     fixed
> >>>>>>>     for
> >>>>>>>     the problem.
> >>>>>> So it sounds like even with 3.18 and a new mdadm, this bug still
> >>>>>> persists?
> >>>>>>
> >>>>>> -- Joe
> >>>>>>
> >>>>>> --
> >>>>> Hi Joe
> >>>>>
> >>>>>     I'm a little confused now. Does the patch
> >>>>>     45eaf45dfa4850df16bc2e8e7903d89021137f40 from linux-stable
> >>>>> resolve the problem?
> >>>>>
> >>>>>     My environment is:
> >>>>>
> >>>>> [root@dhcp-12-133 mdadm]# mdadm --version
> >>>>> mdadm - v3.3.2-18-g93d3bd3 - 18th December 2014  (this is the newest
> >>>>> upstream)
> >>>>> [root@dhcp-12-133 mdadm]# uname -r
> >>>>> 3.18.2
> >>>>>
> >>>>>
> >>>>>     My steps are:
> >>>>>
> >>>>> [root@dhcp-12-133 mdadm]# lsblk
> >>>>> sdb                       8:16   0 931.5G  0 disk
> >>>>> └─sdb1                    8:17   0     5G  0 part
> >>>>> sdc                       8:32   0 186.3G  0 disk
> >>>>> sdd                       8:48   0 931.5G  0 disk
> >>>>> └─sdd1                    8:49   0     5G  0 part
> >>>>> [root@dhcp-12-133 mdadm]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1
> >>>>> /dev/sdd1
> >>>>> --assume-clean
> >>>>> mdadm: Note: this array has metadata at the start and
> >>>>>      may not be suitable as a boot device.  If you plan to
> >>>>>      store '/boot' on this device please ensure that
> >>>>>      your boot-loader understands md/v1.x metadata, or use
> >>>>>      --metadata=0.90
> >>>>> mdadm: Defaulting to version 1.2 metadata
> >>>>> mdadm: array /dev/md0 started.
> >>>>>
> >>>>>     Then I unplug the disk.
> >>>>>
> >>>>> [root@dhcp-12-133 mdadm]# lsblk
> >>>>> sdc                       8:32   0 186.3G  0 disk
> >>>>> sdd                       8:48   0 931.5G  0 disk
> >>>>> └─sdd1                    8:49   0     5G  0 part
> >>>>>    └─md0                   9:0    0     5G  0 raid1
> >>>>> [root@dhcp-12-133 mdadm]# echo faulty >
> >>>>> /sys/block/md0/md/dev-sdb1/state
> >>>>> [root@dhcp-12-133 mdadm]# echo remove >
> >>>>> /sys/block/md0/md/dev-sdb1/state
> >>>>> -bash: echo: write error: Device or resource busy
> >>>>> [root@dhcp-12-133 mdadm]# echo idle > /sys/block/md0/md/sync_action
> >>>>> [root@dhcp-12-133 mdadm]# echo remove >
> >>>>> /sys/block/md0/md/dev-sdb1/state
> >>>>>
> >>>> I cannot reproduce this - using linux 3.18.2.  I'd be surprised if mdadm
> >>>> version affects things.
> >>> Hi Neil
> >>>
> >>>     I'm very curious, because it can reproduce in my machine 100%.
> >>>
> >>>> This error (Device or resoource busy) implies that rdev->raid_disk is >=
> >>>> 0
> >>>> (tested in state_store()).
> >>>>
> >>>> ->raid_disk is set to -1 by remove_and_add_spares() providing:
> >>>>    1/ it isn't Blocked (which is very unlikely)
> >>>>    2/ hot_remove_disk succeeds, which it will if nr_pending is zero, and
> >>>>    3/ nr_pending is zero.
> >>>     I remember I have tired to check those reasons. But it's really is the
> >>>     reason 1
> >>> which is very unlikely.
> >>>
> >>>     I add some code in the function array_state_show
> >>>
> >>>      array_state_show(struct mddev *mddev, char *page) {
> >>>          enum array_state st = inactive;
> >>>          struct md_rdev *rdev;
> >>>
> >>>          rdev_for_each_rcu(rdev, mddev) {
> >>>                  printk(KERN_ALERT "search for %s\n",
> >>>                  rdev->bdev->bd_disk->disk_name);
> >>>                  if (test_bit(Blocked, &rdev->flags))
> >>>                          printk(KERN_ALERT "rdev is Blocked\n");
> >>>                  else
> >>>                          printk(KERN_ALERT "rdev is not Blocked\n");
> >>>      }
> >>>
> >>>    When I echo 1 > /sys/block/sdc/device/delete, then I ran command:
> >>>
> >>> [root@dhcp-12-133 md]# cat /sys/block/md0/md/array_state
> >>> read-auto
> >>    ^^^^^^^^^
> >>
> >> I think that is half the explanation.
> >> You must have the md_mod.start_ro parameter set to '1'.
> >>
> >>
> >>> [root@dhcp-12-133 md]# dmesg
> >>> [ 2679.559185] search for sdc
> >>> [ 2679.559189] rdev is Blocked
> >>> [ 2679.559190] search for sdb
> >>> [ 2679.559190] rdev is not Blocked
> >>>     
> >>>    So sdc is Blocked
> >> and that is the other half - thanks.
> >> (yes, I was wrong.  Sometimes it is easier than being right, but still
> >> yields results).
> >>
> >> When a device fails, it is Blocked until the metadata is updated to record
> >> the failure.  This ensures that no writes succeed without writing to that
> >> device, until we a certain that no read will try reading from that device,
> >> even after a crash/restart.
> >>
> >> Blocked is cleared after the metadata is written, but read-auto (and
> >> read-only) devices never write out their metadata.  So blocked doesn't get
> >> cleared.
> >>
> >> When you "echo idle > .../sync_action" one of the side effects is to with
> >> from 'read-auto' to fully active.  This allows the metadata to be written,
> >> Blocked to be cleared, and the device to be removed.
> >>
> >> If you
> >>    echo none > /sys/block/md0/md/dev-sdc/slot
> >>
> >> first, then the remove will work.
> >>
> >> We could possibly fix it with something like the following, but I'm not sure
> >> I like it.  There is no guarantee that I can see which would ensure the
> >> superblock got updated before the first write if the array switch to
> >> read/write.
> >>
> >> NeilBrown
> >>
> >> diff --git a/drivers/md/md.c b/drivers/md/md.c
> >> index 9233c71138f1..b3d1e8e5e067 100644
> >> --- a/drivers/md/md.c
> >> +++ b/drivers/md/md.c
> >> @@ -7528,7 +7528,7 @@ static int remove_and_add_spares(struct mddev *mddev,
> >>   	rdev_for_each(rdev, mddev)
> >>   		if ((this == NULL || rdev == this) &&
> >>   		    rdev->raid_disk >= 0 &&
> >> -		    !test_bit(Blocked, &rdev->flags) &&
> >> +		    (!test_bit(Blocked, &rdev->flags) || mddev->ro) &&
> >>   		    (test_bit(Faulty, &rdev->flags) ||
> >>   		     ! test_bit(In_sync, &rdev->flags)) &&
> >>   		    atomic_read(&rdev->nr_pending)==0) {
> >>
> >>
> >>
> > Hi Neil
> >
> >     I have tried the patch and the problem can be fixed by it. But I'm sorry that I can't
> > give more advices for better idea about this. I'm not familiar with the metadata part about
> > the md. I'll try to get more time to read the code about md.
> >
> Hi Neil
> 
>      I don't see the patch in linux-stable, do you miss this?

I don't believe this bug is sufficiently serious for the patch to go to
-stable.  However it doesn't need to be fixed - thanks for the reminder.

I've just queued the following patch which I am happy with.  If you
could confirm that it works for you, I would appreciate that.

Thanks,
NeilBrown


From: Neil Brown <neilb@xxxxxxx>
Date: Wed, 17 Jun 2015 12:31:46 +1000
Subject: [PATCH] md: clear Blocked flag on failed devices when array is
 read-only.

The Blocked flag indicates that a device has failed but that this
fact hasn't been recorded in the metadata yet.  Writes to such
devices cannot be allowed until the metadata has been updated.

On a read-only array, the Blocked flag will never be cleared.
This prevents the device being removed from the array.

If the metadata is being handled by the kernel
(i.e. !mddev->external), then we can be sure that if the array is
switch to writable, then a metadata update will happen and will
record the failure.  So we don't need the flag set.

If metadata is externally managed, it is upto the external manager
to clear the 'blocked' flag.

Reported-by: XiaoNi <xni@xxxxxxxxxx>
Signed-off-by: NeilBrown <neilb@xxxxxxx>

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 3d339e2..5a6681a 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8125,6 +8125,15 @@ void md_check_recovery(struct mddev *mddev)
 		int spares = 0;
 
 		if (mddev->ro) {
+			struct md_rdev *rdev;
+			if (!mddev->external && mddev->in_sync)
+				/* 'Blocked' flag not needed as failed devices
+				 * will be recorded if array switched to read/write.
+				 * Leaving it set will prevent the device
+				 * from being removed.
+				 */
+				rdev_for_each(rdev, mddev)
+					clear_bit(Blocked, &rdev->flags);
 			/* On a read-only array we can:
 			 * - remove failed devices
 			 * - add already-in_sync devices if the array itself


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html