Re: [PATCH 1/2] md, raid1, raid10: Set MD_BROKEN for RAID1 and RAID10

"Tkaczyk, Mariusz" <mariusz.tkaczyk@xxxxxxxxxxxxxxx> · Tue, 28 Sep 2021 09:33:10 +0200




Hi Song,

On 28.09.2021 00:59, Song Liu wrote:
+               if (!test_bit(Faulty, &rdev->flags) &&
+                    !test_bit(MD_BROKEN, &mddev->flags) &&
+                    (bio->bi_opf & MD_FAILFAST)) {

So with MD_BROKEN, we will not try to update the SB?
Array is dead, is there added value in writing failed state?

I think there is still value to remember this. Say a raid1 with 2 drives,
A and B. If B is unpluged from the system, we would like to update SB
on A to remember that. Otherwise, if B is pluged back later (maybe after
system reset), we won't tell which one has the latest data.

Does this make sense?

Removing one drive from raid1 array doesn't cause raid failure.
So, removing B will be recorded on A.
Raid1 is not good example because to fail array we need to remove
all members, so MD_BROKEN doesn't matter because
!test_bit(Faulty, &rdev->flags) is true.

Let say that we have raid10 with member A, B, C, D. Member A is removed,
and it is recorded correctly (we are degraded now). Next, member B is
removed which causes array failure.
W/ my patch:
member B failure is not saved on members C and D. Raid is failed but
it is not recorded in metadata.
w/o my patch:
member B failure is saved on C and D, and metadata is in failed state.

                          set_bit(MD_SB_NEED_REWRITE, &mddev->sb_flags);
                          set_bit(LastDev, &rdev->flags);
                  }
@@ -2979,7 +2980,8 @@ state_store(struct md_rdev *rdev, const char *buf, size_t len)
          int err = -EINVAL;
          if (cmd_match(buf, "faulty") && rdev->mddev->pers) {
                  md_error(rdev->mddev, rdev);
-               if (test_bit(Faulty, &rdev->flags))
+
+               if (!test_bit(MD_BROKEN, &rdev->mddev->flags))

I don't think this makes much sense. EBUSY for already failed array
sounds weird.
Also, shall we also set MD_BROKEN here?

Actually, we just called md_error above, so we don't need to set MD_BROKEN here.
But we shouldn't return EBUSY in such cases, right?

About EBUSY:
This is how it is implemented in mdadm, we are expecting it in
case of failure. See my fix[2].
I agree that it can be confusing, but this is how it is working.
Do you want to change it across mdadm and md?
This will break compatibility.

About MD_BROKEN:
As you see we are determining failure by checking rdev state, if "Faulty"
in flags after md_error() is not set, then it assumes that array is
failed and EBUSY is returned to userspace.

This changed the behavior for raid0, no?

W/o the change mdadm --fail on raid0 will get EBUSY. W/ this change,
it will get 0, and the device is NOT marked as faulty, right?

See commit mentioned in description. MD_BROKEN is used for raid0,
so EBUSY is returned, same as w/o patch.

Thanks,
Mariusz