Hi Song,
On 29.09.2021 03:29, Song Liu wrote:
set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags);
set_bit(LastDev, &rdev->flags);
}
@@ -2979,7 +2980,8 @@ state_store(struct md_rdev *rdev, const char *buf,
size_t len)
int err = -EINVAL;
if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
md_error(rdev->mddev, rdev);
- if (test_bit(Faulty, &rdev->flags))
+
+ if (!test_bit(MD_BROKEN, &rdev->mddev->flags))
I don't think this makes much sense. EBUSY for already failed array
sounds weird.
Also, shall we also set MD_BROKEN here?
Actually, we just called md_error above, so we don't need to set MD_BROKEN
here.
But we shouldn't return EBUSY in such cases, right?
About EBUSY:
This is how it is implemented in mdadm, we are expecting it in
case of failure. See my fix[2].
I agree that it can be confusing, but this is how it is working.
Do you want to change it across mdadm and md?
This will break compatibility.
About MD_BROKEN:
As you see we are determining failure by checking rdev state, if "Faulty"
in flags after md_error() is not set, then it assumes that array is
failed and EBUSY is returned to userspace.
This changed the behavior for raid0, no?
W/o the change mdadm --fail on raid0 will get EBUSY. W/ this change,
it will get 0, and the device is NOT marked as faulty, right?
See commit mentioned in description. MD_BROKEN is used for raid0,
so EBUSY is returned, same as w/o patch.
Hmm... I am still confused. In state_store(), md_error is a no-op for raid0,
which will not set Faulty or MD_BROKEN. So we will get -EBUSY w/o
the patch and 0 w/ the patch, no? It is probably not a serious issue though.
Yeah, you are right. There is no error_handler. I missed that.
Now, I reviewed raid0 again.
With my change result won't be clear for raid0, it is correlated with IO.
When drive disappears and there is IO, then it could return -EBUSY,
raid0_make_request() may set MD_BROKEN first.
In there is no IO then 0 will be returned. I need to close this gap.
Thanks for your curiosity.
So, please tell me, are you ok with idea of this patch? I will send
requested details with v2 but I want to know if I choose good way to close
raid5 issue, which is a main problem.
Thanks,
Mariusz