On Fri, Sep 24, 2021 at 2:09 PM Song Liu <song@xxxxxxxxxx> wrote: > > On Fri, Sep 17, 2021 at 8:35 AM Mariusz Tkaczyk > <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > > > > The idea of stopping all writes if devices is failed, introduced by > > 62f7b1989c0 ("md raid0/linear: Mark array as 'broken' and fail BIOs if > > a member is gone") seems to be reasonable so use MD_BROKEN for RAID1 and > > RAID10. Support in RAID456 is added in next commit. > > If userspace (mdadm) forces md to fail the device (Failure state > > written via sysfs), then EBUSY is expected if array will become failed. > > To achieve that, check for MD_BROKEN and if is set, then return EBUSY to > > be complaint with userspace. > > For faulty state, handled via ioctl, let the error_handler to decide. > > > > Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@xxxxxxxxxxxxxxx> > > Thanks for the patch and sorry for the late reply. > > > --- > > drivers/md/md.c | 16 ++++++++++------ > > drivers/md/md.h | 4 ++-- > > drivers/md/raid1.c | 1 + > > drivers/md/raid10.c | 1 + > > 4 files changed, 14 insertions(+), 8 deletions(-) > > > > diff --git a/drivers/md/md.c b/drivers/md/md.c > > index c322841d4edc..ac20eb2ddff7 100644 > > --- a/drivers/md/md.c > > +++ b/drivers/md/md.c > > @@ -926,8 +926,9 @@ static void super_written(struct bio *bio) > > pr_err("md: %s gets error=%d\n", __func__, > > blk_status_to_errno(bio->bi_status)); > > md_error(mddev, rdev); > > - if (!test_bit(Faulty, &rdev->flags) > > - && (bio->bi_opf & MD_FAILFAST)) { > > + if (!test_bit(Faulty, &rdev->flags) && > > + !test_bit(MD_BROKEN, &mddev->flags) && > > + (bio->bi_opf & MD_FAILFAST)) { > > So with MD_BROKEN, we will not try to update the SB? > > > set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); > > set_bit(LastDev, &rdev->flags); > > } > > @@ -2979,7 +2980,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len) > > int err = -EINVAL; > > if (cmd_match(buf, "faulty") && rdev->mddev->pers) { > > md_error(rdev->mddev, rdev); > > - if (test_bit(Faulty, &rdev->flags)) > > + > > + if (!test_bit(MD_BROKEN, &rdev->mddev->flags)) > > I don't think this makes much sense. EBUSY for already failed array > sounds weird. > Also, shall we also set MD_BROKEN here? > Actually, we just called md_error above, so we don't need to set MD_BROKEN here. But we shouldn't return EBUSY in such cases, right?