On 9/2/13, NeilBrown <neilb@xxxxxxx> wrote: > On Thu, 29 Aug 2013 11:55:09 +0200 Andreas Baer <synthetic.gods@xxxxxxxxx> > wrote: > >> On 8/26/13, NeilBrown <neilb@xxxxxxx> wrote: >> > On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer >> > <synthetic.gods@xxxxxxxxx> >> > wrote: >> > >> >> Short description: >> >> I've discovered a problem during re-assembly of a clean RAID. mdadm >> >> throws one disk out because this disk apparently shows another disk as >> >> failed. After assembly, RAID starts to recover on existing spare disk. >> >> >> >> In detail: >> >> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7 >> >> active disks and 1 spare disk (disk size: 1 TB), fully synced and >> >> clean. >> >> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that >> >> one disk is thrown out. >> >> >> >> Manual assembly command for /dev/md0, relevant partitions are >> >> /dev/sd[b-i]1: >> >> # mdadm --assemble --scan -vvv >> >> mdadm: looking for devices for /dev/md0 >> >> mdadm: no RAID superblock on /dev/sdi >> >> mdadm: no RAID superblock on /dev/sdh >> >> mdadm: no RAID superblock on /dev/sdg >> >> mdadm: no RAID superblock on /dev/sdf >> >> mdadm: no RAID superblock on /dev/sde >> >> mdadm: no RAID superblock on /dev/sdd >> >> mdadm: no RAID superblock on /dev/sdc >> >> mdadm: no RAID superblock on /dev/sdb >> >> mdadm: no RAID superblock on /dev/sda1 >> >> mdadm: no RAID superblock on /dev/sda >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7. >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6. >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5. >> >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4. >> >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3. >> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2. >> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1. >> >> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0. >> >> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed >> >> mdadm: no uptodate device for slot 0 of /dev/md0 >> >> mdadm: added /dev/sdd1 to /dev/md0 as 2 >> >> mdadm: added /dev/sde1 to /dev/md0 as 3 >> >> mdadm: added /dev/sdf1 to /dev/md0 as 4 >> >> mdadm: added /dev/sdg1 to /dev/md0 as 5 >> >> mdadm: added /dev/sdh1 to /dev/md0 as 6 >> >> mdadm: added /dev/sdi1 to /dev/md0 as 7 >> >> mdadm: added /dev/sdc1 to /dev/md0 as 1 >> >> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare. >> >> >> >> I finally made a test by modifying mdadm V3.2.5 sources to not write >> >> any data to any superblock and to simply exit() somewhere in the >> >> middle of assembly process to be able to reproduce this behavior >> >> without any RAID re-creation/synchronization. >> >> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I >> >> switch to mdadm V3.2.5 it shows the same messages as above. >> >> >> >> The real problem: >> >> I have more than a single machine receiving a similar software update >> >> so I need to find a solution or workaround around this problem. By the >> >> way, from another test without an existing spare disk, there seems to >> >> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5. >> >> >> >> It would also be a great help if someone could explain the reason >> >> behind the relevant code fragment for rejecting a device, e.g. why is >> >> only the 'most_recent' device important? >> >> >> >> /* If this device thinks that 'most_recent' has failed, then >> >> * we must reject this device. >> >> */ >> >> if (j != most_recent && >> >> content->array.raid_disks > 0 && >> >> devices[most_recent].i.disk.raid_disk >= 0 && >> >> devmap[j * content->array.raid_disks + >> >> devices[most_recent].i.disk.raid_disk] == 0) { >> >> if (verbose > -1) >> >> fprintf(stderr, Name ": ignoring %s as it reports %s as >> >> failed\n", >> >> devices[j].devname, devices[most_recent].devname); >> >> best[i] = -1; >> >> continue; >> >> } >> >> >> >> I also attached some files showing some details about related >> >> superblocks before and after assembly as well as about RAID status >> >> itself. >> > >> > >> > Thanks for the thorough report. I think this issue has been fixed in >> > 3.3-rc1 >> > You can fix it for 3.2.5 by applying the following patch: >> > >> > diff --git a/Assemble.c b/Assemble.c >> > index 227d66f..bc65c29 100644 >> > --- a/Assemble.c >> > +++ b/Assemble.c >> > @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev, >> > devices[devcnt].i.disk.minor = minor(stb.st_rdev); >> > if (most_recent < devcnt) { >> > if (devices[devcnt].i.events >> > - > devices[most_recent].i.events) >> > + > devices[most_recent].i.events && >> > + devices[devcnt].i.disk.state == 6) >> > most_recent = devcnt; >> > } >> > if (content->array.level == LEVEL_MULTIPATH) >> > >> > The "most recent" device is important as we need to choose one to >> > compare >> > all >> > others again. The problem is that the code in 3.2.5 can sometimes >> > choose a >> > spare, which isn't such a good idea. >> > >> > The "most recent" is also important because when a collection of devices >> > is given to the kernel it will give priority to some information which is >> > on the >> > last device passed in. So we make sure that the last device given to >> > the kernel is the "most recent". >> > >> > Please let me know if the patch fixes your problem. >> > >> > NeilBrown >> >> First of all, thanks for your very helpful 'most recent disk' >> explanation. >> >> Sadly, the patch didn't fix my problem because the event counters are >> really equal on all disks (inclusive spare) and the first disk that is >> checked is the spare disk so there is no reason to set another disk as >> 'most recent disk', but I improved your patch a little bit by >> providing more output and created also an own solution, but that needs >> review because I'm not sure if it can be done like that. >> >> Patch 1: Your solution with more output >> Diff: mdadm-3.2.5-noassemble-patch1.diff >> Assembly: mdadm-3.2.5-noassemble-patch1.txt >> >> Patch 2: My proposed solution >> Diff: mdadm-3.2.5-noassemble-patch2.diff >> Assembly: mdadm-3.2.5-noassemble-patch2.txt > > > Thanks for the testing and suggestions. I see what I missed now. > Can you check if this patch works please? > > Thanks. > NeilBrown > > diff --git a/Assemble.c b/Assemble.c > index 227d66f..9131917 100644 > --- a/Assemble.c > +++ b/Assemble.c > @@ -215,7 +215,7 @@ int Assemble(struct supertype *st, char *mddev, > unsigned int okcnt, sparecnt, rebuilding_cnt; > unsigned int req_cnt; > int i; > - int most_recent = 0; > + int most_recent = -1; > int chosen_drive; > int change = 0; > int inargv = 0; > @@ -847,8 +847,9 @@ int Assemble(struct supertype *st, char *mddev, > devices[devcnt].i = *content; > devices[devcnt].i.disk.major = major(stb.st_rdev); > devices[devcnt].i.disk.minor = minor(stb.st_rdev); > - if (most_recent < devcnt) { > - if (devices[devcnt].i.events > + if (devices[devcnt].i.disk_state == 6) { > + if (most_recent < 0 || > + devices[devcnt].i.events > > devices[most_recent].i.events) > most_recent = devcnt; > } Your patch seems to work without issues. There is only a small typo: + if (devices[devcnt].i.disk_state == 6) { should be: + if (devices[devcnt].i.disk.state == 6) { I attached the patch that I'm finally using to this mail. Thank you very much for your help.
Attachment:
no-spare-as-most_recent.patch
Description: Binary data