On 07/30/2013 02:42 AM, NeilBrown wrote: > On Mon, 29 Jul 2013 22:42:25 +0200 Martin Wilck <mwilck@xxxxxxxx> wrote: > >> >>> My current idea to solve this is yet another separate thread just for >>> monitoring kernel state changes. Don't have it ready yet, though. >> >> Another idea would be in manage_member, after queueing the metadata >> update and waking up the monitor, to wait for the metadata to finish >> processing before actually starting the recovery (writing "recover" to >> sync_action). >> >> Martin > > I hope an extra thread won't be necessary :-) I think the general problem that mdmon may be busy writing to disk while something changes in the kernel is real. But introducing an extra thread would make things even more complex as they are now, so it might be something to avoid. > I think that manage_member is the place to fix this. However it might be > even simpler than you suggest. > > We currently have > > replace_array(container, a, newa); > sysfs_set_str(&a->info, NULL, "sync_action", "recover"); > > monitor subsequently takes that 'newa', looks at 'sync_action', see that it > is 'idle' and assume that the recover never happened. > Suppose we change it to: > > if (sysfs_set_str(&a->info, NULL, "sync_action", "recover") == 0) > newa->prev_action = newa->curr_action = recovery; > replace_array(container, a, newa); > > Then it wouldn't matter if monitor never saw the 'recovery' state as manager > explicitly told it that recovery had started. > > Could you try that? Ingenious idea :-) Unfortunately it isn't sufficient. However this PLUS waiting for the metadata upate to finish makes my test succeed reliably (10/10, in the previous failure scenario). I'll send in the current status of patches in a minute. Martin > > Thanks, > NeilBrown -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html