Re: Update to mdadm V3.2.5 => RAID starts to recover (reproducible)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 29 Aug 2013 11:55:09 +0200 Andreas Baer <synthetic.gods@xxxxxxxxx>
wrote:

> On 8/26/13, NeilBrown <neilb@xxxxxxx> wrote:
> > On Thu, 22 Aug 2013 15:20:06 +0200 Andreas Baer <synthetic.gods@xxxxxxxxx>
> > wrote:
> >
> >> Short description:
> >> I've discovered a problem during re-assembly of a clean RAID. mdadm
> >> throws one disk out because this disk apparently shows another disk as
> >> failed. After assembly, RAID starts to recover on existing spare disk.
> >>
> >> In detail:
> >> 1. RAID-6 (Superblock V0.90.00) created with mdadm V2.6.4 and with 7
> >> active disks and 1 spare disk (disk size: 1 TB), fully synced and
> >> clean.
> >> 2. RAID-6 stopped and re-assembled with mdadm V3.2.5, but during that
> >> one disk is thrown out.
> >>
> >> Manual assembly command for /dev/md0, relevant partitions are
> >> /dev/sd[b-i]1:
> >> # mdadm --assemble --scan -vvv
> >> mdadm: looking for devices for /dev/md0
> >> mdadm: no RAID superblock on /dev/sdi
> >> mdadm: no RAID superblock on /dev/sdh
> >> mdadm: no RAID superblock on /dev/sdg
> >> mdadm: no RAID superblock on /dev/sdf
> >> mdadm: no RAID superblock on /dev/sde
> >> mdadm: no RAID superblock on /dev/sdd
> >> mdadm: no RAID superblock on /dev/sdc
> >> mdadm: no RAID superblock on /dev/sdb
> >> mdadm: no RAID superblock on /dev/sda1
> >> mdadm: no RAID superblock on /dev/sda
> >> mdadm: /dev/sdi1 is identified as a member of /dev/md0, slot 7.
> >> mdadm: /dev/sdh1 is identified as a member of /dev/md0, slot 6.
> >> mdadm: /dev/sdg1 is identified as a member of /dev/md0, slot 5.
> >> mdadm: /dev/sdf1 is identified as a member of /dev/md0, slot 4.
> >> mdadm: /dev/sde1 is identified as a member of /dev/md0, slot 3.
> >> mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 2.
> >> mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 1.
> >> mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 0.
> >> mdadm: ignoring /dev/sdb1 as it reports /dev/sdi1 as failed
> >> mdadm: no uptodate device for slot 0 of /dev/md0
> >> mdadm: added /dev/sdd1 to /dev/md0 as 2
> >> mdadm: added /dev/sde1 to /dev/md0 as 3
> >> mdadm: added /dev/sdf1 to /dev/md0 as 4
> >> mdadm: added /dev/sdg1 to /dev/md0 as 5
> >> mdadm: added /dev/sdh1 to /dev/md0 as 6
> >> mdadm: added /dev/sdi1 to /dev/md0 as 7
> >> mdadm: added /dev/sdc1 to /dev/md0 as 1
> >> mdadm: /dev/md0 has been started with 6 drives (out of 7) and 1 spare.
> >>
> >> I finally made a test by modifying mdadm V3.2.5 sources to not write
> >> any data to any superblock and to simply exit() somewhere in the
> >> middle of assembly process to be able to reproduce this behavior
> >> without any RAID re-creation/synchronization.
> >> So using mdadm V2.6.4 /dev/md0 assembles without problems and if I
> >> switch to mdadm V3.2.5 it shows the same messages as above.
> >>
> >> The real problem:
> >> I have more than a single machine receiving a similar software update
> >> so I need to find a solution or workaround around this problem. By the
> >> way, from another test without an existing spare disk, there seems to
> >> be no 'throwing out'-problem when switching from V2.6.4 to V3.2.5.
> >>
> >> It would also be a great help if someone could explain the reason
> >> behind the relevant code fragment for rejecting a device, e.g. why is
> >> only the 'most_recent' device important?
> >>
> >> /* If this device thinks that 'most_recent' has failed, then
> >>   * we must reject this device.
> >>   */
> >> if (j != most_recent &&
> >>     content->array.raid_disks > 0 &&
> >>     devices[most_recent].i.disk.raid_disk >= 0 &&
> >>     devmap[j * content->array.raid_disks +
> >> devices[most_recent].i.disk.raid_disk] == 0) {
> >>     if (verbose > -1)
> >>         fprintf(stderr, Name ": ignoring %s as it reports %s as
> >> failed\n",
> >>             devices[j].devname, devices[most_recent].devname);
> >>     best[i] = -1;
> >>     continue;
> >> }
> >>
> >> I also attached some files showing some details about related
> >> superblocks before and after assembly as well as about RAID status
> >> itself.
> >
> >
> > Thanks for the thorough report.  I think this issue has been fixed in
> > 3.3-rc1
> > You can fix it for 3.2.5 by applying the following patch:
> >
> > diff --git a/Assemble.c b/Assemble.c
> > index 227d66f..bc65c29 100644
> > --- a/Assemble.c
> > +++ b/Assemble.c
> > @@ -849,7 +849,8 @@ int Assemble(struct supertype *st, char *mddev,
> >  		devices[devcnt].i.disk.minor = minor(stb.st_rdev);
> >  		if (most_recent < devcnt) {
> >  			if (devices[devcnt].i.events
> > -			    > devices[most_recent].i.events)
> > +			    > devices[most_recent].i.events &&
> > +			    devices[devcnt].i.disk.state == 6)
> >  				most_recent = devcnt;
> >  		}
> >  		if (content->array.level == LEVEL_MULTIPATH)
> >
> > The "most recent" device is important as we need to choose one to compare
> > all
> > others again.  The problem is that the code in 3.2.5 can sometimes choose a
> > spare, which isn't such a good idea.
> >
> > The "most recent" is also important because when a collection of devices is
> > given to the kernel it will give priority to some information which is on
> > the
> > last device passed in.  So we make sure that the last device given to the
> > kernel is the "most recent".
> >
> > Please let me know if the patch fixes your problem.
> >
> > NeilBrown
> 
> First of all, thanks for your very helpful 'most recent disk' explanation.
> 
> Sadly, the patch didn't fix my problem because the event counters are
> really equal on all disks (inclusive spare) and the first disk that is
> checked is the spare disk so there is no reason to set another disk as
> 'most recent disk', but I improved your patch a little bit by
> providing more output and created also an own solution, but that needs
> review because I'm not sure if it can be done like that.
> 
> Patch 1: Your solution with more output
> Diff: mdadm-3.2.5-noassemble-patch1.diff
> Assembly: mdadm-3.2.5-noassemble-patch1.txt
> 
> Patch 2: My proposed solution
> Diff: mdadm-3.2.5-noassemble-patch2.diff
> Assembly: mdadm-3.2.5-noassemble-patch2.txt


Thanks for the testing and suggestions.  I see what I missed now.
Can you check if this patch works please?

Thanks.
NeilBrown

diff --git a/Assemble.c b/Assemble.c
index 227d66f..9131917 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -215,7 +215,7 @@ int Assemble(struct supertype *st, char *mddev,
 	unsigned int okcnt, sparecnt, rebuilding_cnt;
 	unsigned int req_cnt;
 	int i;
-	int most_recent = 0;
+	int most_recent = -1;
 	int chosen_drive;
 	int change = 0;
 	int inargv = 0;
@@ -847,8 +847,9 @@ int Assemble(struct supertype *st, char *mddev,
 		devices[devcnt].i = *content;
 		devices[devcnt].i.disk.major = major(stb.st_rdev);
 		devices[devcnt].i.disk.minor = minor(stb.st_rdev);
-		if (most_recent < devcnt) {
-			if (devices[devcnt].i.events
+		if (devices[devcnt].i.disk_state == 6) {
+			if (most_recent < 0 ||
+			    devices[devcnt].i.events
 			    > devices[most_recent].i.events)
 				most_recent = devcnt;
 		}

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux