Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)

NeilBrown <neilb@xxxxxxx> · Mon, 9 Sep 2013 12:39:48 +1000

On Thu, 5 Sep 2013 17:22:26 +0200 Andreas Baer <synthetic.gods@xxxxxxxxx>
wrote:

> On 9/2/13, NeilBrown <neilb@xxxxxxx> wrote:
> > On Thu, 29 Aug 2013 11:55:09 +0200 Andreas Baer <synthetic.gods@xxxxxxxxx>
> > wrote:
> >
> >> On 8/26/13, NeilBrown <neilb@xxxxxxx> wrote:
> >> > On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer
> >> > <synthetic.gods@xxxxxxxxx>
> >> > wrote:
> >> >
> >> >> Short description:
> >> >> I've discovered a problem during re-assembly of a clean RAID. mdadm
> >> >> throws one disk out because this disk apparently shows another disk as
> >> >> failed. After assembly, RAID starts to recover on existing spare disk.
> >> >>
> >> >> In detail:
> >> >> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
> >> >> active disks and 1 spare disk (disk size: 1 TB), fully synced and
> >> >> clean.
> >> >> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
> >> >> one disk is thrown out.
> >> >>
> >> >> Manual assembly command for /dev/md0, relevant partitions are
> >> >> /dev/sd[b-i]1:
> >> >> # mdadm --assemble --scan -vvv
> >> >> mdadm: looking for devices for /dev/md0
> >> >> mdadm: no RAID superblock on /dev/sdi
> >> >> mdadm: no RAID superblock on /dev/sdh
> >> >> mdadm: no RAID superblock on /dev/sdg
> >> >> mdadm: no RAID superblock on /dev/sdf
> >> >> mdadm: no RAID superblock on /dev/sde
> >> >> mdadm: no RAID superblock on /dev/sdd
> >> >> mdadm: no RAID superblock on /dev/sdc
> >> >> mdadm: no RAID superblock on /dev/sdb
> >> >> mdadm: no RAID superblock on /dev/sda1
> >> >> mdadm: no RAID superblock on /dev/sda
> >> >> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
> >> >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
> >> >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
> >> >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
> >> >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> >> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> >> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> >> >> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> >> >> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
> >> >> mdadm: no uptodate device for slot 0 of /dev/md0
> >> >> mdadm: added /dev/sdd1 to /dev/md0 as 2
> >> >> mdadm: added /dev/sde1 to /dev/md0 as 3
> >> >> mdadm: added /dev/sdf1 to /dev/md0 as 4
> >> >> mdadm: added /dev/sdg1 to /dev/md0 as 5
> >> >> mdadm: added /dev/sdh1 to /dev/md0 as 6
> >> >> mdadm: added /dev/sdi1 to /dev/md0 as 7
> >> >> mdadm: added /dev/sdc1 to /dev/md0 as 1
> >> >> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
> >> >>
> >> >> I finally made a test by modifying mdadm V3.2.5 sources to not write
> >> >> any data to any superblock and to simply exit() somewhere in the
> >> >> middle of assembly process to be able to reproduce this behavior
> >> >> without any RAID re-creation/synchronization.
> >> >> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
> >> >> switch to mdadm V3.2.5 it shows the same messages as above.
> >> >>
> >> >> The real problem:
> >> >> I have more than a single machine receiving a similar software update
> >> >> so I need to find a solution or workaround around this problem. By the
> >> >> way, from another test without an existing spare disk, there seems to
> >> >> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
> >> >>
> >> >> It would also be a great help if someone could explain the reason
> >> >> behind the relevant code fragment for rejecting a device, e.g. why is
> >> >> only the 'most_recent' device important?
> >> >>
> >> >> /* If this device thinks that 'most_recent' has failed, then
> >> >>   * we must reject this device.
> >> >>   */
> >> >> if (j != most_recent &&
> >> >>     content->array.raid_disks > 0 &&
> >> >>     devices[most_recent].i.disk.raid_disk >= 0 &&
> >> >>     devmap[j * content->array.raid_disks +
> >> >> devices[most_recent].i.disk.raid_disk] == 0) {
> >> >>     if (verbose > -1)
> >> >>         fprintf(stderr, Name ": ignoring %s as it reports %s as
> >> >> failed\n",
> >> >>             devices[j].devname, devices[most_recent].devname);
> >> >>     best[i] = -1;
> >> >>     continue;
> >> >> }
> >> >>
> >> >> I also attached some files showing some details about related
> >> >> superblocks before and after assembly as well as about RAID status
> >> >> itself.
> >> >
> >> >
> >> > Thanks for the thorough report.  I think this issue has been fixed in
> >> > 3.3-rc1
> >> > You can fix it for 3.2.5 by applying the following patch:
> >> >
> >> > diff --git a/Assemble.c b/Assemble.c
> >> > index 227d66f..bc65c29 100644
> >> > --- a/Assemble.c
> >> > +++ b/Assemble.c
> >> > @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev,
> >> >  		devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> >> >  		if (most_recent < devcnt) {
> >> >  			if (devices[devcnt].i.events
> >> > -			    > devices[most_recent].i.events)
> >> > +			    > devices[most_recent].i.events &&
> >> > +			    devices[devcnt].i.disk.state == 6)
> >> >  				most_recent = devcnt;
> >> >  		}
> >> >  		if (content->array.level == LEVEL_MULTIPATH)
> >> >
> >> > The "most recent" device is important as we need to choose one to
> >> > compare
> >> > all
> >> > others again.  The problem is that the code in 3.2.5 can sometimes
> >> > choose a
> >> > spare, which isn't such a good idea.
> >> >
> >> > The "most recent" is also important because when a collection of devices
> >> > is given to the kernel it will give priority to some information which is
> >> > on the
> >> > last device passed in.  So we make sure that the last device given to
> >> > the kernel is the "most recent".
> >> >
> >> > Please let me know if the patch fixes your problem.
> >> >
> >> > NeilBrown
> >>
> >> First of all, thanks for your very helpful 'most recent disk'
> >> explanation.
> >>
> >> Sadly, the patch didn't fix my problem because the event counters are
> >> really equal on all disks (inclusive spare) and the first disk that is
> >> checked is the spare disk so there is no reason to set another disk as
> >> 'most recent disk', but I improved your patch a little bit by
> >> providing more output and created also an own solution, but that needs
> >> review because I'm not sure if it can be done like that.
> >>
> >> Patch 1: Your solution with more output
> >> Diff: mdadm-3.2.5-noassemble-patch1.diff
> >> Assembly: mdadm-3.2.5-noassemble-patch1.txt
> >>
> >> Patch 2: My proposed solution
> >> Diff: mdadm-3.2.5-noassemble-patch2.diff
> >> Assembly: mdadm-3.2.5-noassemble-patch2.txt
> >
> >
> > Thanks for the testing and suggestions.  I see what I missed now.
> > Can you check if this patch works please?
> >
> > Thanks.
> > NeilBrown
> >
> > diff --git a/Assemble.c b/Assemble.c
> > index 227d66f..9131917 100644
> > --- a/Assemble.c
> > +++ b/Assemble.c
> > @@ -215,7 +215,7 @@ int Assemble(struct supertype *st, char *mddev,
> >  	unsigned int okcnt, sparecnt, rebuilding_cnt;
> >  	unsigned int req_cnt;
> >  	int i;
> > -	int most_recent = 0;
> > +	int most_recent = -1;
> >  	int chosen_drive;
> >  	int change = 0;
> >  	int inargv = 0;
> > @@ -847,8 +847,9 @@ int Assemble(struct supertype *st, char *mddev,
> >  		devices[devcnt].i = *content;
> >  		devices[devcnt].i.disk.major = major(stb.st_rdev);
> >  		devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> > -		if (most_recent < devcnt) {
> > -			if (devices[devcnt].i.events
> > +		if (devices[devcnt].i.disk_state == 6) {
> > +			if (most_recent < 0 ||
> > +			    devices[devcnt].i.events
> >  			    > devices[most_recent].i.events)
> >  				most_recent = devcnt;
> >  		}
> 
> Your patch seems to work without issues.
> 
> There is only a small typo:
> +		if (devices[devcnt].i.disk_state == 6) {
> should be:
> +		if (devices[devcnt].i.disk.state == 6) {
> 
> I attached the patch that I'm finally using to this mail.
> Thank you very much for your help.

Great.  Thanks for the confirmation.

This fix is in 3.3.

NeilBrown
Attachment:
signature.asc

Description: PGP signature