RE: RAID5 - Disk failed during re-shape

Sam Clark <sclark_77@xxxxxxxxxxx> · Wed, 15 Aug 2012 18:32:43 +0200

Unbelievable!  It mounted! 

With the -o noload, my array is mounted, and files are readable!

I've tested a few, and they look fine, but it's obviously hard to be sure on
a larger scale.

In any case, I'll certainly be able to recover more data now!

Thanks again Neil!
Sam

-----Original Message-----
From: linux-raid-owner@xxxxxxxxxxxxxxx
[mailto:linux-raid-owner@xxxxxxxxxxxxxxx] On Behalf Of NeilBrown
Sent: 14 August 2012 23:06
To: Sam Clark
Cc: 'Phil Turmel'; linux-raid@xxxxxxxxxxxxxxx
Subject: Re: RAID5 - Disk failed during re-shape

On Tue, 14 Aug 2012 15:40:50 +0200 Sam Clark <sclark_77@xxxxxxxxxxx> wrote:

> Thanks Neil,
> 
> Tried that and failed on the first attempt, so I tried shuffling 
> around the dev order.. unfortunately I don't know what they were 
> previously, but I do recall being surprised that sdd was first on the 
> list when I was looking at it previously, so perhaps a starting point.  
> Since there are some 120 different permutations of device order 
> (assuming all 5 could be anywhere), I modified the script to accept
parameters and automated it a little further.
> 
> I ended up with a few 'possible successes' but none that would mount (i.e.
> fsck actually ran and found problems with the superblocks, group 
> descriptor checksums and Inode details, instead of failing with 
> errorlevel 8).  The most successful so far was the ones with SDD as 
> device 1 and SDE as device 2.. one particular combination (sdd sde sdb 
> sdc sdf) seems to report every time "/dev/md_restore has been mounted 
> 35 times without being checked, check forced.".. does this mean we're on
the right combination?

Certainly encouraging.  However it might just mean that the first device is
correct.  I think you only need to find the filesystem superblock to be able
to report that.

> 
> In any case, that one produces a lot of output (some 54MB when fsck is 
> piped to a file) that looks bad and still fails to mount.  (I assume 
> that "mount -r /dev/md_restore /mnt/restore" I all I need to mount 
> with?  I also tried with "-t ext4", but that didn't seem to help either).

54MB certainly seems like more that we were hoping for.
Yes, that mount command should be sufficient.  You could try adding "-o
noload".  I'm not sure what it does but from the code it looks like it tried
to be more forgiving of some stuff.

> 
> This is a summary of the errors that appear: 
> Pass 1: Checking inodes, blocks, and sizes
> (51 of these)
> Inode 198574650 has an invalid extent node (blk 38369280, lblk 0) 
> Clear? no
> 
> (47 of these)
> Inode 223871986, i_blocks is 2737216, should be 0.  Fix? no
> 
> Pass 2: Checking directory structure
> Pass 3: Checking directory connectivity /lost+found not found.  
> Create? no
> 
> Pass 4: Checking reference counts
> Pass 5: Checking group summary information Block bitmap differences:  
> +(36700161--36700162) +36700164 +36700166
> +(36700168--36700170) (this goes on like this for many pages.. in 
> +fact, most
> of the 54 MB is here)
> 
> (and 492 of these)
> Free blocks count wrong for group #3760 (24544, counted=16439).
> Fix? no
> 
> Free blocks count wrong for group #3761 (0, counted=16584).
> Fix? no
> 
> /dev/md_restore: ********** WARNING: Filesystem still has errors 
> **********
> /dev/md_restore: 107033/274718720 files (5.6% non-contiguous),
> 976413581/1098853872 blocks
> 
> 
> I also tried setting the reshape number to 1002152448 , 1002153984,
> 1002157056 , 1002158592 and 1002160128 (+/ - a couple of multiples) 
> but output didn't seem to change much in any case.. Not sure if there 
> are many different values worth testing there.

Probably not.

> 
> So, unless there's something else worth trying based on the above, it 
> looks to me that it's time to raise the white flag and start again... 
> it's not too bad, I'll recover most of the data.
> 
> Many thanks for your help so far, but if I may... 1 more question...
> Hopefully I won't lose a disk during re-shape in the future, but just 
> in case I do, or for other unforeseen issues, what are good things to 
> backup on a system?  Is it enough to backup the /etc/mdadm/mdadm.conf 
> and /proc/mdstat on a regular basis?  Or should I also backup the 
> device superblocks?  Or something else?

There isn't really any need to backup anything.  Just don't use a buggy
kernel (which unfortunately I let out into the wild and got into Ubuntu).
The most useful thing if things do go wrong is the "mdadm --examine" output
of all devices.

> 
> Ok, so that's actually 4 questions  ... sorry :-)
> 
> Thanks again for all your efforts. 
> Sam

Sorry we couldn't get your data back.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html