Re: Recover RAID6 with 4 disks removed

Robin Hill <robin@xxxxxxxxxxxxxxx> · Thu, 6 Feb 2020 16:22:50 +0000

On Thu Feb 06, 2020 at 03:07:00PM +0100, Reindl Harald wrote:

> Am 06.02.20 um 14:46 schrieb Nicolas Karolak:
> > I have (had...) a RAID6 array with 8 disks and tried to remove 4 disks
> > from it, and obviously i messed up. Here is the commands i issued (i
> > do not have the output of them):
> 
> didn't you realize that RAID6 has redundancy to survive *exactly two*
> failing disks no matter how many disks the array has anmd the data and
> redundancy informations are spread ove the disks?
> 
> > mdadm --manage /dev/md1 --fail /dev/sdh
> > mdadm --manage /dev/md1 --fail /dev/sdg
> > mdadm --detail /dev/md1
> > cat /proc/mdstat
> > mdadm --manage /dev/md1 --fail /dev/sdf
> > mdadm --manage /dev/md1 --fail /dev/sde
> > mdadm --detail /dev/md1
> > cat /proc/mdstat
> > mdadm --manage /dev/md1 --remove /dev/sdh
> > mdadm --manage /dev/md1 --remove /dev/sdg
> > mdadm --manage /dev/md1 --remove /dev/sde
> > mdadm --manage /dev/md1 --remove /dev/sdf
> > mdadm --detail /dev/md1
> > cat /proc/mdstat
> > mdadm --grow /dev/md1 --raid-devices=4
> > mdadm --grow /dev/md1 --array-size 7780316160  # from here it start
> > going wrong on the system
> 
> becaue mdadm din't't prevent you from shoot yourself in the foot, likely
> for cases when one needs a hammer for restore from a uncommon state as
> last ressort
> 
> set more than one disk at the same time to "fail" is aksing for troubles
> no matter what
> 
> what happens when one drive starts to puke when you removed every
> redundancy and happily start a reshape that implies heavy IO?
> 
> > I began to have "inpout/output" error, `ls` or `cat` or almost every
> > command was not working (something like "/usr/sbin/ls not found").
> > `mdadm` command was still working, so i did that:
> > 
> > ```
> > mdadm --manage /dev/md1 --re-add /dev/sde
> > mdadm --manage /dev/md1 --re-add /dev/sdf
> > mdadm --manage /dev/md1 --re-add /dev/sdg
> > mdadm --manage /dev/md1 --re-add /dev/sdh
> > mdadm --grow /dev/md1 --raid-devices=8
> > ```
> > 
> > The disks were re-added, but as "spares". After that i powered down
> > the server and made backup of the disks with `dd`.
> > 
> > Is there any hope to retrieve the data? If yes, then how?
> 
> unlikely - the started reshape did writes

I don't think it'll have written anything as the array was in a failed
state. You'll have lost the metadata on the original disks though as
they were removed & re-added (unless you have anything recording these
before the above operations?) so that means doing a create --assume-clean
and "fsck -n" loop with all combinations until you find the correct
order (and assumes they were added at the same time and so share the
same offset). At least you know the positions of 4 of the array members,
so that reduces the number of combinations you'll need.

Check the wiki - there should be instructions on there regarding use of
overlays to prevent further accidental damage. There may even be scripts
to help with automating the create/fsck process.

Cheers,
    Robin