On 08/08/2013 02:44 AM, NeilBrown wrote: > On Tue, 6 Aug 2013 23:38:03 +0200 mwilck@xxxxxxxx wrote: > >> When we create an array while mdmon is working on an event >> (e.g. disk failure), the meta data on disk may not be up-to-date. >> >> Patch "DDF: ddf_open_new: check device status for new subarray" >> added some checks for in the monitor for that situation - in particular, >> to handle a freshly created array with faulty disks. The remaining >> problem is that the kernel may start syncing the disks before this >> situation is detected. This patch delays recovery until mdmon finished >> checking. >> >> tests/10ddf-fail-create-race should succeed reliably with this patch >> and "DDF: ddf_open_new: check device status for new subarray". Without, >> it will fail sporadically. >> >> Signed-off-by: Martin Wilck <mwilck@xxxxxxxx> >> --- >> Create.c | 8 ++++++++ >> managemon.c | 6 ++++++ >> 2 files changed, 14 insertions(+), 0 deletions(-) >> >> diff --git a/Create.c b/Create.c >> index ac22f77..f9b7db2 100644 >> --- a/Create.c >> +++ b/Create.c >> @@ -993,6 +993,14 @@ int Create(struct supertype *st, char *mddev, >> need_mdmon = 0; >> break; >> default: >> + /* >> + * The meta data we saw on disk may not be >> + * up-to-date. The monitor will check and >> + * possibly fail. Avoid a resync happening >> + * in the kernel before that. >> + */ >> + sysfs_set_str(&info, NULL, "sync_action", >> + "frozen"); >> err = sysfs_set_str(&info, NULL, "array_state", >> "readonly"); >> break; >> diff --git a/managemon.c b/managemon.c >> index f40bbdb..5bc54da 100644 >> --- a/managemon.c >> +++ b/managemon.c >> @@ -744,6 +744,12 @@ static void manage_new(struct mdstat_ent *mdstat, >> new->container = NULL; >> free_aa(new); >> } else { >> + /* >> + * Create() set this to frozen. >> + * This relies on the kernel clear FROZEN status >> + * if an invalid value is written to sync_action. >> + */ >> + sysfs_set_str(&new->info, NULL, "sync_action", ""); >> replace_array(container, victim, new); >> if (failed) { >> new->check_degraded = 1; > > > I don't think that this patch should be necessary. > If you find it makes a difference, then I'm missing something. > > Recovery never starts while the array is readonly - the kernel doesn't allow > it. > So mdmon should be in complete control as it is the only one that is allowed > to change the array from readonly. > > Whether it quite does the right thing in this case I can't promise, but it > should be able to do the right thing without any extra help from mdadm. I had a look at the dmesg output and it appeared to me that the kernel started resync on the new array before it noticed that one of the disks were faulty. But I may have been mistaken. Unfortunately I didn't keep those logs. For now I am fine with what you are saying - you know the ways of the kernel MD driver way better then me. > And writing an empty string to 'sync_action' will not do anything useful. > You probably mean to write "idle". Hmm... 4223 static ssize_t 4224 action_store(struct mddev *mddev, const char *page, size_t len) 4225 { ... 4229 if (cmd_match(page, "frozen")) 4230 set_bit(MD_RECOVERY_FROZEN, &mddev->recovery); 4231 else 4232 clear_bit(MD_RECOVERY_FROZEN, &mddev->recovery); To me this looks as if FROZEN was cleared whenever anything else but "frozen" was written. What am I missing here? I actually meant to do exactly that because I didn't want to loose the previous status. But as I said, I am not familiar enough with the kernel md code to understand exactly what happens. > I've applied and push out the previous patches but not this one. Thanks. Martin -- Dr. Martin Wilck PRIMERGY System Software Engineer x86 Server Engineering FUJITSU Fujitsu Technology Solutions GmbH Heinz-Nixdorf-Ring 1 33106 Paderborn, Germany Phone: ++49 5251 525 2796 Fax: ++49 5251 525 2820 Email: martin.wilck@xxxxxxxxxxxxxx Internet: http://ts.fujitsu.com Company Details: http://ts.fujitsu.com/imprint -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html