mdadm: Assemble.c: "force-one" update conflicts with the split-brain protection logic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Neil,
I see the following issue:

# I have a raid5 with drives a,b,c,d. Drive a fails, and then drive b
fails, and so the whole array fails.
# Superblocks of c and d show a and b as failed (via 0xffe in
dev_roles[] array).
# Now I perform --assemble --force
# Since b has higher event count than a, b's event count is bumped to
match the event count of c and d ("force-one")
# However, something goes wrong and assembly is aborted
# Now assembly is restarted (--force doesn't matter now)

At this point, drive b is chosen as "most_recent", since it comes
first and has highest event count (equal to c and d).
However, when drives c and d are inspected, they are rejected by the
following split-brain protection code:
		if (j != most_recent &&
		    content->array.raid_disks > 0 &&
		    devices[most_recent].i.disk.raid_disk >= 0 &&
		    devmap[j * content->array.raid_disks +
devices[most_recent].i.disk.raid_disk] == 0) {
			if (c->verbose > -1)
				pr_err("ignoring %s as it reports %s as failed\n",
					devices[j].devname, devices[most_recent].devname);
			best[i] = -1;
			continue;
		}

because the dev_roles[] array of c and d show b as failed (because b
really had failed while c and d were operational).

So I was thinking that the "force-one" update should also somehow
align the dev_roles[] arrays of all devices that it affects. More
precisely, if we decide to promote a device via "force-one" path, we
must update dev_roles[] of all "good" devices to say that the promoted
device is not 0xffe, but has a valid role. Does this make sense? What
do you think?

And I also think, that the split-brain protection logic that you added
should be made a little bit more explicit. Currently, the first device
with the highest event count is selected as "most_recent", and
split-brain protection is enforced WRT to that device. But this logic
can be affected by the order of devices passed to "assemble". I
already mentioned that in the past I pitched a proposal of dealing
with it. Do you want me to go over it and try to pitch it again?

Thanks!
Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux