On Thu, 1 Aug 2013 12:32:50 +0000 "Dorau, Lukasz" <lukasz.dorau@xxxxxxxxx> wrote: > > There is another, more serious, problem. > When we stop the array during initial resync (mdadm -Ss) > and the function is_resync_complete() is entered for the last time, > array->array.raid_disks already equals 0, because it is zero'ed by manager: > a->info.array.raid_disks = mdstat->raid_disks; > at managemon.c:454. > As a result sync_size equals 0 and is_resync_complete() incorrectly returns 1 and resync finishes... > > It seems to be a race condition between monitor and manager - manager changes value of array.raid_disks too fast. Yes - that is a serious problem. Thanks for reporting it. I think this is the correct fix. Thanks, NeilBrown From e49a8a80265ab2150c96b636450f5825bcd69d4a Mon Sep 17 00:00:00 2001 From: NeilBrown <neilb@xxxxxxx> Date: Mon, 5 Aug 2013 15:40:16 +1000 Subject: [PATCH] mdmon: don't use 'ghost' values from an inactive array. It is possible for mdmon to see (in /proc/mdstat) and array in 'inactive' state, "mdadm -S" has written "inactive" to "array_state". In this state values such as "raid_disk" are not meaningful and so should be ignored by manage_member(). Reported-by: "Dorau, Lukasz" <lukasz.dorau@xxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> diff --git a/managemon.c b/managemon.c index c245655..f40bbdb 100644 --- a/managemon.c +++ b/managemon.c @@ -450,9 +450,11 @@ static void manage_member(struct mdstat_ent *mdstat, /* Raced with something */ return; - // FIXME - a->info.array.raid_disks = mdstat->raid_disks; - // MORE + if (mdstat->active) { + // FIXME + a->info.array.raid_disks = mdstat->raid_disks; + // MORE + } if (sysfs_get_ll(&a->info, NULL, "component_size", &component_size) >= 0) a->info.component_size = component_size << 1;
Attachment:
signature.asc
Description: PGP signature