Re: RAID 1 to RAID 5 failure

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Mon, 4 Apr 2022 17:42:22 +0100

On 04/04/2022 16:19, Jorge Nunes wrote:
Hi everyone.
Probably this isn't the forum to post this, but I can't get true
valuable help on this:

This is exactly the correct forum ...

I have a NAS which is capable of having a RAID with four disks with
armbian debian bullseye. I used it for a long time with only two, sda
and sdd on RAID 1 - they are WD30EFRX. Now, I bought two more WD30EFRX
(refurbished) and my idea was to add them to have a RAID 5 array.
These were the steps I've made:

Didn't do a backup :-(

Unmount everything:
```
$ sudo umount /srv/dev-disk-by-uuid-d1430a9e-6461-481b-9765-86e18e517cfc

$ sudo umount -f /dev/md0
```
Stopped the array:
```
$ sudo mdadm --stop /dev/md0
```

Change the array to a RAID 5 with only the existing disks:
```
$ sudo mdadm --create /dev/md0 -a yes -l 5 -n 2 /dev/sda /dev/sdd
```
I made a mistake here and used the whole disks instead of the
/dev/sd[ad]1 partitions and MDADM warned me that /dev/sdd had a
partition and it would be overridden... I pressed 'Y' to continue...
:-(
It took a long time to complete without any errors.

You made an even bigger error here - and I'm very sorry but it was 
probably fatal :-(

If sda and sdd were your original disks, you created a NEW array, with 
different geometry. This will probably have trashed your data.

Then I added the two new disks /dev/sdb and /dev/sdc to the array:
```
$ sudo mdadm --add /dev/md0 /dev/sdb
$ sudo mdadm --add /dev/md0 /dev/sdc
```
And did a grow to use the four disks:
```
$ sudo mdadm --grow /dev/md0 --raid-disk=4
```
And if the first mistake wasn't fatal, this probably was.

During this process a reshape was performed like this
```
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid5 sdc[4] sdb[3] sdd[2] sda[0]
       2930134016 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
       [==================>..]  reshape = 90.1% (2640502272/2930134016)
finish=64.3min speed=75044K/sec
       bitmap: 0/22 pages [0KB], 65536KB chunk
```
```
$ sudo mdadm -D /dev/md0

/dev/md0:
            Version : 1.2
      Creation Time : Fri Mar 11 16:10:02 2022
         Raid Level : raid5
         Array Size : 2930134016 (2794.39 GiB 3000.46 GB)
      Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
       Raid Devices : 4
      Total Devices : 4
        Persistence : Superblock is persistent

      Intent Bitmap : Internal

        Update Time : Sat Mar 12 20:20:14 2022
              State : clean, reshaping
     Active Devices : 4
    Working Devices : 4
     Failed Devices : 0
      Spare Devices : 0

             Layout : left-symmetric
         Chunk Size : 512K

Consistency Policy : bitmap

     Reshape Status : 97% complete
      Delta Devices : 2, (2->4)

               Name : helios4:0  (local to host helios4)
               UUID : 8e1ac1a8:8eabc3de:c01c8976:0be5bf6c
             Events : 12037

     Number   Major   Minor   RaidDevice State
        0       8        0        0      active sync   /dev/sda
        2       8       48        1      active sync   /dev/sdd
        4       8       32        2      active sync   /dev/sdc
        3       8       16        3      active sync   /dev/sdb
```

When this looooooong process has completed without errors, I did a e2fsck
```
$ sudo e2fsck /dev/md0
```
And... it gave this info:
```
e2fsck 1.46.2 (28-Feb-2021)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/md0

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
     e2fsck -b 8193 <device>
or
     e2fsck -b 32768 <device>
```
At this point I realized that I've made some mistakes during this process...
Googled for the problem and I think the disks in the array are somehow
order 'reversed' judging from this post:
https://forum.qnap.com/viewtopic.php?t=125534

So, the partition is 'gone' and when I try to assemble the array now,
I have this info:
```
$ sudo mdadm --assemble --scan -v

mdadm: /dev/sdd is identified as a member of /dev/md/0, slot 1.
mdadm: /dev/sdb is identified as a member of /dev/md/0, slot 3.
mdadm: /dev/sdc is identified as a member of /dev/md/0, slot 2.
mdadm: /dev/sda is identified as a member of /dev/md/0, slot 0.
mdadm: added /dev/sdd to /dev/md/0 as 1
mdadm: added /dev/sdc to /dev/md/0 as 2
mdadm: added /dev/sdb to /dev/md/0 as 3
mdadm: added /dev/sda to /dev/md/0 as 0
mdadm: /dev/md/0 has been started with 4 drives.

$ dmesg

[143605.261894] md/raid:md0: device sda operational as raid disk 0
[143605.261909] md/raid:md0: device sdb operational as raid disk 3
[143605.261919] md/raid:md0: device sdc operational as raid disk 2
[143605.261927] md/raid:md0: device sdd operational as raid disk 1
[143605.267400] md/raid:md0: raid level 5 active with 4 out of 4
devices, algorithm 2
[143605.792653] md0: detected capacity change from 0 to 17580804096

$ cat /proc/mdstat

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5]
[raid4] [raid10]
md0 : active (auto-read-only) raid5 sda[0] sdb[3] sdc[4] sdd[2]
       8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
       bitmap: 0/22 pages [0KB], 65536KB chunk

$ sudo mdadm -D /dev/md0

/dev/md0:
            Version : 1.2
      Creation Time : Fri Mar 11 16:10:02 2022
         Raid Level : raid5
         Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
      Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
       Raid Devices : 4
      Total Devices : 4
        Persistence : Superblock is persistent

      Intent Bitmap : Internal

        Update Time : Sat Mar 12 21:24:59 2022
              State : clean
     Active Devices : 4
    Working Devices : 4
     Failed Devices : 0
      Spare Devices : 0

             Layout : left-symmetric
         Chunk Size : 512K

Consistency Policy : bitmap

               Name : helios4:0  (local to host helios4)
               UUID : 8e1ac1a8:8eabc3de:c01c8976:0be5bf6c
             Events : 12124

     Number   Major   Minor   RaidDevice State
        0       8        0        0      active sync   /dev/sda
        2       8       48        1      active sync   /dev/sdd
        4       8       32        2      active sync   /dev/sdc
        3       8       16        3      active sync   /dev/sdb
```

The array mounts but there is no superblock.

At this stage, I did a photorec to try to recover my valuable data
(mainly family photos):

This I am afraid is probably your best bet.
```
$ sudo photorec /log /d ~/k/RAID_REC/ /dev/md0
```
I just recovered a lot of them but others are corrupted because on the
photorec recovering process (sector by sector) it increments the
sector count as time passes but then the counter is 'reset' to a lower
value (my suspicion that the disks are scrambled in the array) and it
recovers some files again (some are equal).

No they're not scrambled. The raid spreads blocks across the individual 
disks. You're running photorec over the md. Try running it over the 
individual disks, sda,sdb,sdc,sdd. You might get a different set of 
pictures back.> Best,
>
> Jorge

So, my question is: Is there a chance to redo the array correctly
without losing the information inside? Is it possible to recover the
'lost' partition that existed on RAID 1 to be able to do a convenient
backup? Or the only chance is to have a correct disk alignment inside
the array to be able to use photorec to recover the files correctly?

I appreciate your help.
Thanks!

I've cc'd the guys most likely to be able to help, but I think they'll 
give you the same answer I have, sorry.

Your only hope is probably to convert it back to the original two-disk 
raid 5, then it is *likely* that your original mirror will be in place. 
If you then recreate the original partition, I'm *hoping* this will give 
you your original mirror back in a broken state. From which you can 
might be able to recover.

But I seriously suggest DON'T DO ANYTHING that writes to the disk until 
the experts chime in. You've trashed your raid, don't make it any worse.

Wol