On Tue, Sep 28, 2021 at 12:55 AM Tkaczyk, Mariusz <mariusz.tkaczyk@xxxxxxxxxxxxxxx> wrote: > > On 28.09.2021 09:33, Tkaczyk, Mariusz wrote: > > Hi Song, > > > > On 28.09.2021 00:59, Song Liu wrote: > >>>>>> + if (!test_bit(Faulty, &rdev->flags) && > >>>>>> + !test_bit(MD_BROKEN, &mddev->flags) && > >>>>>> + (bio->bi_opf & MD_FAILFAST)) { > >>>>> > >>>>> So with MD_BROKEN, we will not try to update the SB? > >>> Array is dead, is there added value in writing failed state? > >> > >> I think there is still value to remember this. Say a raid1 with 2 drives, > >> A and B. If B is unpluged from the system, we would like to update SB > >> on A to remember that. Otherwise, if B is pluged back later (maybe after > >> system reset), we won't tell which one has the latest data. > >> > >> Does this make sense? > > > > Removing one drive from raid1 array doesn't cause raid failure. > > So, removing B will be recorded on A. > > Raid1 is not good example because to fail array we need to remove > > all members, so MD_BROKEN doesn't matter because > > !test_bit(Faulty, &rdev->flags) is true. > is false. > > Oh, I messed it up. There is no faulty flag in this case, we cannot remove > last drive. Since member is (physically) gone then there is no change to > update metadata, even if it is requested. > > > > > Let say that we have raid10 with member A, B, C, D. Member A is removed, > > and it is recorded correctly (we are degraded now). Next, member B is > > removed which causes array failure. > > W/ my patch: > > member B failure is not saved on members C and D. Raid is failed but > > it is not recorded in metadata. > > w/o my patch: > > member B failure is saved on C and D, and metadata is in failed state. > >>>>> > >>>>>> set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags); > >>>>>> set_bit(LastDev, &rdev->flags); > >>>>>> } > >>>>>> @@ -2979,7 +2980,8 @@ state_store(struct md_rdev *rdev, const char *buf, > >>>>>> size_t len) > >>>>>> int err = -EINVAL; > >>>>>> if (cmd_match(buf, "faulty") && rdev->mddev->pers) { > >>>>>> md_error(rdev->mddev, rdev); > >>>>>> - if (test_bit(Faulty, &rdev->flags)) > >>>>>> + > >>>>>> + if (!test_bit(MD_BROKEN, &rdev->mddev->flags)) > >>>>> > >>>>> I don't think this makes much sense. EBUSY for already failed array > >>>>> sounds weird. > >>>>> Also, shall we also set MD_BROKEN here? > >>>>> > >>>> Actually, we just called md_error above, so we don't need to set MD_BROKEN > >>>> here. > >>>> But we shouldn't return EBUSY in such cases, right? > >>>> > >>> About EBUSY: > >>> This is how it is implemented in mdadm, we are expecting it in > >>> case of failure. See my fix[2]. > >>> I agree that it can be confusing, but this is how it is working. > >>> Do you want to change it across mdadm and md? > >>> This will break compatibility. > >>> > >>> About MD_BROKEN: > >>> As you see we are determining failure by checking rdev state, if "Faulty" > >>> in flags after md_error() is not set, then it assumes that array is > >>> failed and EBUSY is returned to userspace. > >> > >> This changed the behavior for raid0, no? > >> > >> W/o the change mdadm --fail on raid0 will get EBUSY. W/ this change, > >> it will get 0, and the device is NOT marked as faulty, right? > >> > > See commit mentioned in description. MD_BROKEN is used for raid0, > > so EBUSY is returned, same as w/o patch. Hmm... I am still confused. In state_store(), md_error is a no-op for raid0, which will not set Faulty or MD_BROKEN. So we will get -EBUSY w/o the patch and 0 w/ the patch, no? It is probably not a serious issue though. Thanks, Song