Re: RAID5 - Disk failed during re-shape

NeilBrown <neilb@xxxxxxx> · Thu, 16 Aug 2012 08:38:36 +1000

On Wed, 15 Aug 2012 18:32:43 +0200 Sam Clark <sclark_77@xxxxxxxxxxx> wrote:

> Unbelievable!  It mounted! 
> 
> With the -o noload, my array is mounted, and files are readable!
> 
> I've tested a few, and they look fine, but it's obviously hard to be sure on
> a larger scale.
> 
> In any case, I'll certainly be able to recover more data now!

.. and that's what makes it all worthwhile!

Thanks for hanging in there and letting us know the result.

NeilBrown

> 
> Thanks again Neil!
> Sam
> 
> -----Original Message-----
> From: linux-raid-owner@xxxxxxxxxxxxxxx
> [mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of NeilBrown
> Sent: 14 August 2012 23:06
> To: Sam Clark
> Cc: 'Phil Turmel'; linux-raid@xxxxxxxxxxxxxxx
> Subject: Re: RAID5 - Disk failed during re-shape
> 
> On Tue, 14 Aug 2012 15:40:50 +0200 Sam Clark <sclark_77@xxxxxxxxxxx> wrote:
> 
> > Thanks Neil,
> > 
> > Tried that and failed on the first attempt, so I tried shuffling 
> > around the dev order.. unfortunately I don't know what they were 
> > previously, but I do recall being surprised that sdd was first on the 
> > list when I was looking at it previously, so perhaps a starting point.  
> > Since there are some 120 different permutations of device order 
> > (assuming all 5 could be anywhere), I modified the script to accept
> parameters and automated it a little further.
> > 
> > I ended up with a few 'possible successes' but none that would mount (i.e.
> > fsck actually ran and found problems with the superblocks, group 
> > descriptor checksums and Inode details, instead of failing with 
> > errorlevel 8).  The most successful so far was the ones with SDD as 
> > device 1 and SDE as device 2.. one particular combination (sdd sde sdb 
> > sdc sdf) seems to report every time "/dev/md_restore has been mounted 
> > 35 times without being checked, check forced.".. does this mean we're on
> the right combination?
> 
> Certainly encouraging.  However it might just mean that the first device is
> correct.  I think you only need to find the filesystem superblock to be able
> to report that.
> 
> > 
> > In any case, that one produces a lot of output (some 54MB when fsck is 
> > piped to a file) that looks bad and still fails to mount.  (I assume 
> > that "mount -r /dev/md_restore /mnt/restore" I all I need to mount 
> > with?  I also tried with "-t ext4", but that didn't seem to help either).
> 
> 54MB certainly seems like more that we were hoping for.
> Yes, that mount command should be sufficient.  You could try adding "-o
> noload".  I'm not sure what it does but from the code it looks like it tried
> to be more forgiving of some stuff.
> 
> 
> > 
> > This is a summary of the errors that appear: 
> > Pass 1: Checking inodes, blocks, and sizes
> > (51 of these)
> > Inode 198574650 has an invalid extent node (blk 38369280, lblk 0) 
> > Clear? no
> > 
> > (47 of these)
> > Inode 223871986, i_blocks is 2737216, should be 0.  Fix? no
> > 
> > Pass 2: Checking directory structure
> > Pass 3: Checking directory connectivity /lost+found not found.  
> > Create? no
> > 
> > Pass 4: Checking reference counts
> > Pass 5: Checking group summary information Block bitmap differences:  
> > +(36700161--36700162) +36700164 +36700166
> > +(36700168--36700170) (this goes on like this for many pages.. in 
> > +fact, most
> > of the 54 MB is here)
> > 
> > (and 492 of these)
> > Free blocks count wrong for group #3760 (24544, counted=16439).
> > Fix? no
> > 
> > Free blocks count wrong for group #3761 (0, counted=16584).
> > Fix? no
> > 
> > /dev/md_restore: ********** WARNING: Filesystem still has errors 
> > **********
> > /dev/md_restore: 107033/274718720 files (5.6% non-contiguous),
> > 976413581/1098853872 blocks
> > 
> > 
> > I also tried setting the reshape number to 1002152448 , 1002153984,
> > 1002157056 , 1002158592 and 1002160128 (+/ - a couple of multiples) 
> > but output didn't seem to change much in any case.. Not sure if there 
> > are many different values worth testing there.
> 
> Probably not.
> 
> > 
> > So, unless there's something else worth trying based on the above, it 
> > looks to me that it's time to raise the white flag and start again... 
> > it's not too bad, I'll recover most of the data.
> > 
> > Many thanks for your help so far, but if I may... 1 more question...
> > Hopefully I won't lose a disk during re-shape in the future, but just 
> > in case I do, or for other unforeseen issues, what are good things to 
> > backup on a system?  Is it enough to backup the /etc/mdadm/mdadm.conf 
> > and /proc/mdstat on a regular basis?  Or should I also backup the 
> > device superblocks?  Or something else?
> 
> There isn't really any need to backup anything.  Just don't use a buggy
> kernel (which unfortunately I let out into the wild and got into Ubuntu).
> The most useful thing if things do go wrong is the "mdadm --examine" output
> of all devices.
> 
> 
> > 
> > Ok, so that's actually 4 questions  ... sorry :-)
> > 
> > Thanks again for all your efforts. 
> > Sam
> 
> Sorry we couldn't get your data back.
> 
> NeilBrown

Attachment:
signature.asc

Description: PGP signature