[PATCH 6/9] mdmon: fix, close spare activation race

Dan Williams <dan.j.williams@xxxxxxxxx> · Thu, 25 Aug 2011 19:14:29 -0700

The following test fails when the md_check_recovery() event triggered by
the ro->rw transition causes remove_and_add_spares() to run while mdmon
is attempting spare activation.

Result is that the kernel races to set the slot immediately after
sysfs_add_disk() writes new_dev.  mdmon thinks the spare activation
failed and declines to send the monitor a new acitve_array.  We show
degraded after the wait because the monitor cannot notify the metadata
that all disks are in_sync.

#!/bin/bash
i=0
false
while [ $? == 1 ]
do
	i=$((i+1))
	mdadm -Ss
	mdadm -CR /dev/md0 /dev/loop[0-2] -n 3 -e imsm
	mdadm -CR /dev/md1 /dev/loop[01] missing -n 3 -l 5
	mdadm --wait /dev/md1
	mdadm -E /dev/loop2 | grep -i degraded
done
echo "failed: $i"

Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx>
---
 managemon.c |    5 ++++-
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/managemon.c b/managemon.c
index 6662f67..d020f82 100644
--- a/managemon.c
+++ b/managemon.c
@@ -498,7 +498,10 @@ static void manage_member(struct mdstat_ent *mdstat,
 		newa = duplicate_aa(a);
 		if (!newa)
 			goto out;
-		/* Cool, we can add a device or several. */
+		/* prevent the kernel from activating the disk(s) before we
+		 * finish adding them
+		 */
+		sysfs_set_str(&a->info, NULL, "sync_action", "frozen");
 
 		/* Add device to array and set offset/size/slot.
 		 * and open files for each newdev */

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html