Re: [PATCH 1/3] raid0, linear, md: add error_handlers for raid0 and linear

Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> · Tue, 21 Dec 2021 14:56:28 +0100

Hi Xiao,
On Tue, 21 Dec 2021 09:40:50 +0800
Xiao Ni <xni@xxxxxxxxxx> wrote:

> Now for a raid0, it can't remove one member disk from raid0. It
> returns EBUSY and the raid0 still can work well. It makes sense.
> Because all member disks are busy, the admin can't remove the member
> disk and mdadm gives a proper error.

EBUSY means that drive is busy but it is not. Just drive cannot be
safety removed. As I wrote in patch 2:

If "faulty" was not set then -EBUSY was returned to
userspace. It causes that mdadm expects -EBUSY if the array
becomes failed. There are some reasons to not consider this mechanism
as correct:
- drive can't be failed for different reasons.
- there are path where -EBUSY is not reported and drive removal leads
to failed array, without notification for userspace.
- in the array failure case -EBUSY seems to be wrong status. Array is
not busy, but removal process cannot proceed safe.

For compatibility reasons i cannot remove EBUSY. I left more detailed
explanation in patch 2.

> With this patch, it changes the situation. In raid0_error, it sets
> MD_BROKEN. In fact, it isn't broken. So is it really good to set
> MD_BROKEN here? In patch 62f7b1989c0 ("md raid0/linear: Mark array as
> 'broken'...), MD_BROKEN is introduced
> when the member disk disappears and the disk is really broken. For
> raid0/linear, the raid device can't work anymore.

It is broken, any md_error() call should end with appropriate action:
- drive removal (if possible)
- failing array (if cannot degrade array)

We cannot trust drive if md_error() was called, so writes will be
blocked. IMO it is reasonable- to not engage level stack, because one
or more members cannot be trusted (even if it is still accessible). Just
allow to read old data (if still possible).

Thanks,
Mariusz