From: "NeilBrown" <neilb@xxxxxxx>
To: "Xiao Ni" <xni@xxxxxxxxxx>
Cc: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "Bill Kuzeja" <william.kuzeja@xxxxxxxxxxx>
Sent: Monday, February 2, 2015 2:36:01 PM
Subject: Re: RAID1 removing failed disk returns EBUSY
On Thu, 29 Jan 2015 07:14:16 -0500 (EST) Xiao Ni <xni@xxxxxxxxxx> wrote:
----- Original Message -----
From: "NeilBrown" <neilb@xxxxxxx>
To: "Xiao Ni" <xni@xxxxxxxxxx>
Cc: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>,
linux-raid@xxxxxxxxxxxxxxx, "Bill Kuzeja" <william.kuzeja@xxxxxxxxxxx>
Sent: Thursday, January 29, 2015 11:52:17 AM
Subject: Re: RAID1 removing failed disk returns EBUSY
On Sun, 18 Jan 2015 21:33:50 -0500 (EST) Xiao Ni <xni@xxxxxxxxxx> wrote:
----- Original Message -----
From: "Joe Lawrence" <joe.lawrence@xxxxxxxxxxx>
To: "Xiao Ni" <xni@xxxxxxxxxx>
Cc: "NeilBrown" <neilb@xxxxxxx>, linux-raid@xxxxxxxxxxxxxxx, "Bill
Kuzeja" <william.kuzeja@xxxxxxxxxxx>
Sent: Friday, January 16, 2015 11:10:31 PM
Subject: Re: RAID1 removing failed disk returns EBUSY
On Fri, 16 Jan 2015 00:20:12 -0500
Xiao Ni <xni@xxxxxxxxxx> wrote:
Hi Joe
Thanks for reminding me. I didn't do that. Now it can remove
successfully after writing
"idle" to sync_action.
I thought wrongly that the patch referenced in this mail is
fixed
for
the problem.
So it sounds like even with 3.18 and a new mdadm, this bug still
persists?
-- Joe
--
Hi Joe
I'm a little confused now. Does the patch
45eaf45dfa4850df16bc2e8e7903d89021137f40 from linux-stable
resolve the problem?
My environment is:
[root@dhcp-12-133 mdadm]# mdadm --version
mdadm - v3.3.2-18-g93d3bd3 - 18th December 2014 (this is the newest
upstream)
[root@dhcp-12-133 mdadm]# uname -r
3.18.2
My steps are:
[root@dhcp-12-133 mdadm]# lsblk
sdb 8:16 0 931.5G 0 disk
└─sdb1 8:17 0 5G 0 part
sdc 8:32 0 186.3G 0 disk
sdd 8:48 0 931.5G 0 disk
└─sdd1 8:49 0 5G 0 part
[root@dhcp-12-133 mdadm]# mdadm -CR /dev/md0 -l1 -n2 /dev/sdb1
/dev/sdd1
--assume-clean
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device. If you plan to
store '/boot' on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
--metadata=0.90
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started.
Then I unplug the disk.
[root@dhcp-12-133 mdadm]# lsblk
sdc 8:32 0 186.3G 0 disk
sdd 8:48 0 931.5G 0 disk
└─sdd1 8:49 0 5G 0 part
└─md0 9:0 0 5G 0 raid1
[root@dhcp-12-133 mdadm]# echo faulty >
/sys/block/md0/md/dev-sdb1/state
[root@dhcp-12-133 mdadm]# echo remove >
/sys/block/md0/md/dev-sdb1/state
-bash: echo: write error: Device or resource busy
[root@dhcp-12-133 mdadm]# echo idle > /sys/block/md0/md/sync_action
[root@dhcp-12-133 mdadm]# echo remove >
/sys/block/md0/md/dev-sdb1/state
I cannot reproduce this - using linux 3.18.2. I'd be surprised if mdadm
version affects things.
Hi Neil
I'm very curious, because it can reproduce in my machine 100%.
This error (Device or resoource busy) implies that rdev->raid_disk is >=
0
(tested in state_store()).
->raid_disk is set to -1 by remove_and_add_spares() providing:
1/ it isn't Blocked (which is very unlikely)
2/ hot_remove_disk succeeds, which it will if nr_pending is zero, and
3/ nr_pending is zero.
I remember I have tired to check those reasons. But it's really is the
reason 1
which is very unlikely.
I add some code in the function array_state_show
array_state_show(struct mddev *mddev, char *page) {
enum array_state st = inactive;
struct md_rdev *rdev;
rdev_for_each_rcu(rdev, mddev) {
printk(KERN_ALERT "search for %s\n",
rdev->bdev->bd_disk->disk_name);
if (test_bit(Blocked, &rdev->flags))
printk(KERN_ALERT "rdev is Blocked\n");
else
printk(KERN_ALERT "rdev is not Blocked\n");
}
When I echo 1 > /sys/block/sdc/device/delete, then I ran command:
[root@dhcp-12-133 md]# cat /sys/block/md0/md/array_state
read-auto
^^^^^^^^^
I think that is half the explanation.
You must have the md_mod.start_ro parameter set to '1'.
[root@dhcp-12-133 md]# dmesg
[ 2679.559185] search for sdc
[ 2679.559189] rdev is Blocked
[ 2679.559190] search for sdb
[ 2679.559190] rdev is not Blocked
So sdc is Blocked
and that is the other half - thanks.
(yes, I was wrong. Sometimes it is easier than being right, but still
yields results).
When a device fails, it is Blocked until the metadata is updated to record
the failure. This ensures that no writes succeed without writing to that
device, until we a certain that no read will try reading from that device,
even after a crash/restart.
Blocked is cleared after the metadata is written, but read-auto (and
read-only) devices never write out their metadata. So blocked doesn't get
cleared.
When you "echo idle > .../sync_action" one of the side effects is to with
from 'read-auto' to fully active. This allows the metadata to be written,
Blocked to be cleared, and the device to be removed.
If you
echo none > /sys/block/md0/md/dev-sdc/slot
first, then the remove will work.
We could possibly fix it with something like the following, but I'm not sure
I like it. There is no guarantee that I can see which would ensure the
superblock got updated before the first write if the array switch to
read/write.
NeilBrown
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 9233c71138f1..b3d1e8e5e067 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7528,7 +7528,7 @@ static int remove_and_add_spares(struct mddev *mddev,
rdev_for_each(rdev, mddev)
if ((this == NULL || rdev == this) &&
rdev->raid_disk >= 0 &&
- !test_bit(Blocked, &rdev->flags) &&
+ (!test_bit(Blocked, &rdev->flags) || mddev->ro) &&
(test_bit(Faulty, &rdev->flags) ||
! test_bit(In_sync, &rdev->flags)) &&
atomic_read(&rdev->nr_pending)==0) {