Re: reshape changing chunk size won't restart

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 21 Dec 2010 18:09:46 -0800 Andrew Burgess <aab@xxxxxxxxxxx> wrote:

> On 12/21/2010 05:08:10 PM, Neil Brown wrote:
> > On Tue, 21 Dec 2010 16:09:59 -0800 Andrew Burgess <aab@xxxxxxxxxxx>  
> > wrote:
> > 
> > > On 12/21/2010 02:16:19 PM, Neil Brown wrote:
> > >
> > > > > I started a reshape changing chunk size and after it ran
> > > > > for a while i realized the disk i used for the
> > > > > backup file was slow so I killed the mdadm
> > > >
> > > > That was a mistake.
> > >
> > > Its looking to be a bad one
> > >
> > > > > running in the background and tried to restart
> > > > > with the new location (i moved the file just in case)
> > > > >
> > > > > mdadm /dev/md5 --grow --chunk=8
> > > > --backup-file=/my/raid/RAID_BACKUP_FILE
> > > >
> > > > As you discovered, that doesn't work.  I'd like to make it  
> > possible
> > > > to do
> > > > something like that, but time is not something I have a lot of.
> > >
> > > Understand 100%
> > >
> > > > > I didn't try rebooting as the filesystem is mounted and
> > > > > the data seems ok. Didn't want to make things worse...
> > > >
> > > > It shouldn't make things worse.
> > >
> > > I had too because umount wouldn't and neither fuser nor lsof
> > > could find the guilty party
> > >
> > > > Do don't need to reboot, unless md5 has your root filesystem.
> > > > Just unmount, 'mdadm -S /dev/md5', and assemble:
> > > >   mdadm -A /dev/md5  
> > --backup-file=/whereever-you-copied-the-file-to \
> > > >       /dev/sd[dfcbhljgk]1
> > > >
> > > > should do it.
> > >
> > > After rebooting something happened to sdg1:
> > >
> > > mdadm -A /dev/md5 --backup-file=/my/raid/RAID_BACKUP_FILE
> > > /dev/sd[dfcbhljgk]1
> > > mdadm: cannot open device /dev/sdg1: No such device or address
> > > mdadm: /dev/sdg1 has no superblock - assembly aborted
> > >
> > > so i tried it with sdg1 missing
> > >
> > > mdadm -A /dev/md5 --backup-file=/my/raid/RAID_BACKUP_FILE
> > > /dev/sd[dfcbhljk]1
> > > mdadm: Failed to restore critical section for reshape, sorry.
> > >
> > > so i rebooted and power cycled hoping to get sdg1 back but it was
> > > still unhappy with the superblock
> > >
> > > I even tried it letting it scan for devices:
> > >
> > > mdadm -A /dev/md5 --backup-file=/my/raid/RAID_BACKUP_FILE
> > > mdadm: WARNING /dev/sdg1 and /dev/sdg appear to have very similar
> > > superblocks.
> > >        If they are really different, please --zero the superblock  
> > on one
> > >        If they are the same or overlap, please remove one from the
> > >        DEVICE list in mdadm.conf.
> > >
> > > so repeating with all but sdg1 specified it results in:
> > >
> > > mdadm: Failed to restore critical section for reshape, sorry.
> > >
> > > Anything else I can try? We do have the sector it was on in the  
> > original
> > > email when it stopped: (2715648/1953511936)
> > 
> > 
> > The business with sdg1 is a bit odd... I would use "--examine" to  
> > check each
> > device and make sure they have good matching superblocks.  It would  
> > be a lot
> > better if you can make sure all devices get included when you start  
> > the array.
> 
> all the working devices have the same Reshape pos'n value in the  
> superblock.
> sdg1 though:
> 
> mdadm -E /dev/sdg1
> mdadm: cannot open /dev/sdg1: No such device or address

Maybe it has forgotten it's partition table... try

  blockdev --rereadpt /dev/sdg

(check man page to make sure I have the right spell.  Definitely sdg, not
sdg1).

> 
> even though:
> 
> ls -l /dev/sdg*
> brw-rw---- 1 root disk 8, 96 Dec 21 15:53 /dev/sdg
> brw-rw---- 1 root disk 8, 97 Dec 21 15:55 /dev/sdg1
> 
> and the partition table looks ok.
> sdg is brand new but there are no i/o errors in the log
> 
> > Also, try starting with '--verbose', it might give some useful  
> > information,
> > but I don't hold out a lot of hope.
> 
> unless old timestamp is helpful:

Yes, it is.  Makes sense too.
I never really didn't get the timestamp logic straight in my mind.

Try

  MDADM_GROW_ALLOW_OLD=1  mdadm --verbose -A ....

and see how that goes.  Requires mdadm-3.1.2 or later which I think you have.


NeilBrown

> 
> mdadm --verbose -A /dev/md5 --backup-file=/my/raid/RAID_BACKUP_FILE   
> /dev/sd[dfcbhljk]1
> mdadm: looking for devices for /dev/md5
> mdadm: /dev/sdb1 is identified as a member of /dev/md5, slot 0.
> mdadm: /dev/sdc1 is identified as a member of /dev/md5, slot 1.
> mdadm: /dev/sdd1 is identified as a member of /dev/md5, slot 2.
> mdadm: /dev/sdf1 is identified as a member of /dev/md5, slot 8.
> mdadm: /dev/sdh1 is identified as a member of /dev/md5, slot 4.
> mdadm: /dev/sdj1 is identified as a member of /dev/md5, slot 3.
> mdadm: /dev/sdk1 is identified as a member of /dev/md5, slot 6.
> mdadm: /dev/sdl1 is identified as a member of /dev/md5, slot 5.
> mdadm:/dev/md5 has an active reshape - checking if critical section  
> needs to be restored
> mdadm: too-old timestamp on backup-metadata on /my/raid/RAID_BACKUP_FILE
> mdadm: Failed to find backup of critical section
> mdadm: Failed to restore critical section for reshape, sorry.
> 
> > Finally, you will probably end up having to modify mdadm so that it  
> > ignores a
> > failure from Grow_restart.  AS you had a reasonably clean shutdown  
> > rather
> > than a crash, there is a good chance that the backup file isn't  
> > actually
> > needed.
> 
> If the timestamp info above doesn't change your mind then I'll
> try that.
> 
> > The next release of mdadm will have a --invalid-backup option to  
> > --assemble
> > to tell it to just continue even though the backup file looks wrong.
> 
> Hope to send you a patch for that.
> 
> Thanks for your time!
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux