Re: Recover from crash in RAID6 due to hardware failure

Leslie Rhorer <lesrhorer@xxxxxxx> · Mon, 14 Jun 2021 20:33:20 -0500

	There is a fair chance you can recover the data by recreating the array:

mdadm -S /dev/md2
mdadm -C -f -e 1.2 -n 6 -c 64K --level=6 -p left-symmetric /dev/md2 
/dev/sda3 /dev/sdb3 /dev/sdc3 /dev/sdd3 /dev/sde3

On 6/8/2021 6:39 AM, Carlos Maziero wrote:
Em 07/06/2021 07:27, Leslie Rhorer escreveu:
On 6/6/2021 10:07 PM, Carlos Maziero wrote:

However, the disks where added as spares and the volume remained
crashed. Now I'm afraid that such commands have erased metadata and made
things worse... :-(

     Yeah.  Did you at any time Examine the drives and save the output?

mdadm -E /dev/sd[a-e]3

     If so, you have a little bit better chance.

Yes, but I did it only after the failure. The output for all disks is
attached to this message.

Is there a way to reconstruct the array and to recover its data, at
least partially?

     Maybe.  Do you know eaxctly which physical disk was in which RAID
position?  It seems likely the grouping was the same for the corrupted
array as for the other arrays, given the drives are partitioned.

Yes, disk sda was in slot 1, and so on. I physically labelled all slots
and disks.

     First off, try:

mdadm -E /dev/sde3 > /etc/mdadm/RAIDfix

     This should give you the details of the RAID array.  From this,
you should be able to re-create the array.  I would heartily recommend
getting some new drives and copying the data to them before
proceeding.  I would get a 12T drive and copy all of the partitions to
it:

mkfs /dev/sdf  (or mkfs /dev/sdf1)
mount /dev/sdf /mnt (or mount /dev/sdf1 /mnt)
ddrescue /dev/sda3 /mnt/drivea /tmp/tmpdrivea
ddrescue /dev/sdb3 /mnt/driveb /tmp/tmpdriveb
ddrescue /dev/sdc3 /mnt/drivec /tmp/tmpdrivec
ddrescue /dev/sdd3 /mnt/drived /tmp/tmpdrived
ddrescue /dev/sde3 /mnt/drivee /tmp/tmpdrivee

     You could skimp by getting an 8T drive, and then if drive e
doesn't fit, you could create the array without it, and you will be
pretty safe.  It's not what I would do, but if you are strapped for
cash...

OK, I will try to have a secondary disk for that and another computer,
since the NAS has only 5 bays and I would need one more for doing such
operations.

Contents of /proc/mdstat (after the commands above):

Personalities : [raid1] [linear] [raid0] [raid10] [raid6] [raid5]
[raid4]
md2 : active raid6 sda3[0](S) sdb3[1](S) sdc3[2](S) sdd3[3](S) sde3[4]
        8776632768 blocks super 1.2 level 6, 64k chunk, algorithm 2 [5/1]
[____U]
       md1 : active raid1 sda2[1] sdb2[2] sdc2[3] sdd2[0] sde2[4]
        2097088 blocks [5/5] [UUUUU]
       md0 : active raid1 sda1[1] sdb1[2] sdc1[3] sdd1[0] sde1[4]
        2490176 blocks [5/5] [UUUUU]

     There is something odd here.  You say the disks failed, but
clearly they are in decent shape.  The first and second partitions on
all drives appear to be good.  Did the system recover the RAID1 arrays?

Apparently the failure was not in the disks, but in the NAS hardware. I
opened it one week ago for RAM upgrading (replaced the old 512M card by
a 1GB one), and maybe the slot connecting the main board to the SATA
board presented a connectivity problem (but the NAS OS said nothing
about). Anyway, I had 5 disks in a RAID 6 array and the logs showed 3
disks failing at the same time, which is quite unusual. This is the
reason I believe the disks are physically ok.

Thanks for your attention!

Carlos