[PATCH v2] mdadm: Clear extra flags when initializing metadata

Wu Guanghao <wuguanghao3@xxxxxxxxxx> · Mon, 10 Mar 2025 19:09:36 +0800

When adding a disk to a RAID1 array, the metadata is read from the existing
member disks for sync. However, only the bad_blocks flag are copied,
the bad_blocks records are not copied, so the bad_blocks records are all zeros.
The kernel function super_1_load() detects bad_blocks flag and reads the
bad_blocks record, then sets the bad block using badblocks_set().

After the kernel commit 1726c7746("badblocks: improve badblocks_set() for
multiple ranges handling"), if the length of a bad_blocks record is 0, it will
return a failure. Therefore the device addition will fail.

So when adding a new disk, some flags cannot be sync and need to be cleared.

Signed-off-by: Wu Guanghao <wuguanghao3@xxxxxxxxxx>
---
Changelog:
v2:
    Add a testcase.
    Clear extra replace flag.
---
 super1.c                 |  4 ++++
 tests/05r1-add-badblocks | 25 +++++++++++++++++++++++++
 2 files changed, 29 insertions(+)
 create mode 100644 tests/05r1-add-badblocks

diff --git a/super1.c b/super1.c
index fe3c4c64..f4a29f4f 100644
--- a/super1.c
+++ b/super1.c
@@ -1971,6 +1971,10 @@ static int write_init_super1(struct supertype *st)
 	long bm_offset;
 	bool raid0_need_layout = false;

+	/* Clear extra flags */
+	sb->feature_map &= ~__cpu_to_le32(MD_FEATURE_BAD_BLOCKS |
+                                          MD_FEATURE_REPLACEMENT);
+
 	/* Since linux kernel v5.4, raid0 always has a layout */
 	if (has_raid0_layout(sb) && get_linux_version() >= 5004000)
 		raid0_need_layout = true;
diff --git a/tests/05r1-add-badblocks b/tests/05r1-add-badblocks
new file mode 100644
index 00000000..88b064f2
--- /dev/null
+++ b/tests/05r1-add-badblocks
@@ -0,0 +1,25 @@
+#
+# create a raid1 with a drive and set badblocks for the drive.
+# add a new drive does not cause an error.
+#
+
+# create raid1
+mdadm -CR $md0 -l1 -n2 -e1.0 $dev1 missing
+testdev $md0 1 $mdsize1a 64
+sleep 3
+
+# set badblocks for the drive
+dev1_name=$(basename $dev1)
+echo "10000 100" > /sys/block/md0/md/dev-$dev1_name/bad_blocks
+echo "write_error" > /sys/block/md0/md/dev-$dev1_name/state
+
+# maybe fail but that's ok, as it's only intended to
+# record the bad block in the metadata.
+mkfs.ext3 $md0
+
+# re-add and recovery
+mdadm $md0 -a $dev2
+check recovery
+
+mdadm -S $md0
+
-- 
2.33.0