On Mon, Sep 27, 2021 at 7:54 AM Tkaczyk, Mariusz <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > > Hi Song, > Thank for review. > > On 24.09.2021 23:15, Song Liu wrote: > >>> diff --git a/drivers/md/md.c b/drivers/md/md.c > >>> index c322841d4edc..ac20eb2ddff7 100644 > >>> --- a/drivers/md/md.c > >>> +++ b/drivers/md/md.c > >>> @@ -926,8 +926,9 @@ static void super_written(struct bio *bio) > >>> pr_err("md: %s gets error=%d\n", __func__, > >>> blk_status_to_errno(bio->bi_status)); > >>> md_error(mddev, rdev); > >>> - if (!test_bit(Faulty, &rdev->flags) > >>> - && (bio->bi_opf & MD_FAILFAST)) { > >>> + if (!test_bit(Faulty, &rdev->flags) && > >>> + !test_bit(MD_BROKEN, &mddev->flags) && > >>> + (bio->bi_opf & MD_FAILFAST)) { > >> > >> So with MD_BROKEN, we will not try to update the SB? > Array is dead, is there added value in writing failed state? I think there is still value to remember this. Say a raid1 with 2 drives, A and B. If B is unpluged from the system, we would like to update SB on A to remember that. Otherwise, if B is pluged back later (maybe after system reset), we won't tell which one has the latest data. Does this make sense? > > For external arrays failed state is not written, because drive is > not removed from MD device and metadata manager cannot detect array > failure. This is how it was originally implemented (expect raid5 but I > aligned it around two years ago[1]). I tried to make it consistent for > everyone but I failed. Second patch restores possibility to remove > drive by kernel for raid5 so failed state will be detected (for external) > again. > So, maybe I should drop this change for native. What do you think? > > >> > >>> set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); > >>> set_bit(LastDev, &rdev->flags); > >>> } > >>> @@ -2979,7 +2980,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len) > >>> int err = -EINVAL; > >>> if (cmd_match(buf, "faulty") && rdev->mddev->pers) { > >>> md_error(rdev->mddev, rdev); > >>> - if (test_bit(Faulty, &rdev->flags)) > >>> + > >>> + if (!test_bit(MD_BROKEN, &rdev->mddev->flags)) > >> > >> I don't think this makes much sense. EBUSY for already failed array > >> sounds weird. > >> Also, shall we also set MD_BROKEN here? > >> > > Actually, we just called md_error above, so we don't need to set MD_BROKEN here. > > But we shouldn't return EBUSY in such cases, right? > > > About EBUSY: > This is how it is implemented in mdadm, we are expecting it in > case of failure. See my fix[2]. > I agree that it can be confusing, but this is how it is working. > Do you want to change it across mdadm and md? > This will break compatibility. > > About MD_BROKEN: > As you see we are determining failure by checking rdev state, if "Faulty" > in flags after md_error() is not set, then it assumes that array is > failed and EBUSY is returned to userspace. This changed the behavior for raid0, no? W/o the change mdadm --fail on raid0 will get EBUSY. W/ this change, it will get 0, and the device is NOT marked as faulty, right? Song