RAID 1 to RAID 5 failure

Jorge Nunes <jorgebnunes@xxxxxxxxx> · Mon, 4 Apr 2022 16:19:17 +0100

Hi everyone.
Probably this isn't the forum to post this, but I can't get true
valuable help on this:

I have a NAS which is capable of having a RAID with four disks with
armbian debian bullseye. I used it for a long time with only two, sda
and sdd on RAID 1 - they are WD30EFRX. Now, I bought two more WD30EFRX
(refurbished) and my idea was to add them to have a RAID 5 array.
These were the steps I've made:

Didn't do a backup :-(

Unmount everything:
```
$ sudo umount /srv/dev-disk-by-uuid-d1430a9e-6461-481b-9765-86e18e517cfc

$ sudo umount -f /dev/md0
```
Stopped the array:
```
$ sudo mdadm --stop /dev/md0
```

Change the array to a RAID 5 with only the existing disks:
```
$ sudo mdadm --create /dev/md0 -a yes -l 5 -n 2 /dev/sda /dev/sdd
```
I made a mistake here and used the whole disks instead of the
/dev/sd[ad]1 partitions and MDADM warned me that /dev/sdd had a
partition and it would be overridden... I pressed 'Y' to continue...
:-(
It took a long time to complete without any errors.

Then I added the two new disks /dev/sdb and /dev/sdc to the array:
```
$ sudo mdadm --add /dev/md0 /dev/sdb
$ sudo mdadm --add /dev/md0 /dev/sdc
```
And did a grow to use the four disks:
```
$ sudo mdadm --grow /dev/md0 --raid-disk=4
```
During this process a reshape was performed like this
```
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5]
[raid4] [raid10]
md0 : active raid5 sdc[4] sdb[3] sdd[2] sda[0]
      2930134016 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      [==================>..]  reshape = 90.1% (2640502272/2930134016)
finish=64.3min speed=75044K/sec
      bitmap: 0/22 pages [0KB], 65536KB chunk
```
```
$ sudo mdadm -D /dev/md0

/dev/md0:
           Version : 1.2
     Creation Time : Fri Mar 11 16:10:02 2022
        Raid Level : raid5
        Array Size : 2930134016 (2794.39 GiB 3000.46 GB)
     Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sat Mar 12 20:20:14 2022
             State : clean, reshaping
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

    Reshape Status : 97% complete
     Delta Devices : 2, (2->4)

              Name : helios4:0  (local to host helios4)
              UUID : 8e1ac1a8:8eabc3de:c01c8976:0be5bf6c
            Events : 12037

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       2       8       48        1      active sync   /dev/sdd
       4       8       32        2      active sync   /dev/sdc
       3       8       16        3      active sync   /dev/sdb
```

When this looooooong process has completed without errors, I did a e2fsck
```
$ sudo e2fsck /dev/md0
```
And... it gave this info:
```
e2fsck 1.46.2 (28-Feb-2021)
ext2fs_open2: Bad magic number in super-block
e2fsck: Superblock invalid, trying backup blocks...
e2fsck: Bad magic number in super-block while trying to open /dev/md0

The superblock could not be read or does not describe a valid ext2/ext3/ext4
filesystem.  If the device is valid and it really contains an ext2/ext3/ext4
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
    e2fsck -b 8193 <device>
or
    e2fsck -b 32768 <device>
```
At this point I realized that I've made some mistakes during this process...
Googled for the problem and I think the disks in the array are somehow
order 'reversed' judging from this post:
https://forum.qnap.com/viewtopic.php?t=125534

So, the partition is 'gone' and when I try to assemble the array now,
I have this info:
```
$ sudo mdadm --assemble --scan -v

mdadm: /dev/sdd is identified as a member of /dev/md/0, slot 1.
mdadm: /dev/sdb is identified as a member of /dev/md/0, slot 3.
mdadm: /dev/sdc is identified as a member of /dev/md/0, slot 2.
mdadm: /dev/sda is identified as a member of /dev/md/0, slot 0.
mdadm: added /dev/sdd to /dev/md/0 as 1
mdadm: added /dev/sdc to /dev/md/0 as 2
mdadm: added /dev/sdb to /dev/md/0 as 3
mdadm: added /dev/sda to /dev/md/0 as 0
mdadm: /dev/md/0 has been started with 4 drives.

$ dmesg

[143605.261894] md/raid:md0: device sda operational as raid disk 0
[143605.261909] md/raid:md0: device sdb operational as raid disk 3
[143605.261919] md/raid:md0: device sdc operational as raid disk 2
[143605.261927] md/raid:md0: device sdd operational as raid disk 1
[143605.267400] md/raid:md0: raid level 5 active with 4 out of 4
devices, algorithm 2
[143605.792653] md0: detected capacity change from 0 to 17580804096

$ cat /proc/mdstat

Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5]
[raid4] [raid10]
md0 : active (auto-read-only) raid5 sda[0] sdb[3] sdc[4] sdd[2]
      8790402048 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]
      bitmap: 0/22 pages [0KB], 65536KB chunk

$ sudo mdadm -D /dev/md0

/dev/md0:
           Version : 1.2
     Creation Time : Fri Mar 11 16:10:02 2022
        Raid Level : raid5
        Array Size : 8790402048 (8383.18 GiB 9001.37 GB)
     Used Dev Size : 2930134016 (2794.39 GiB 3000.46 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sat Mar 12 21:24:59 2022
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : helios4:0  (local to host helios4)
              UUID : 8e1ac1a8:8eabc3de:c01c8976:0be5bf6c
            Events : 12124

    Number   Major   Minor   RaidDevice State
       0       8        0        0      active sync   /dev/sda
       2       8       48        1      active sync   /dev/sdd
       4       8       32        2      active sync   /dev/sdc
       3       8       16        3      active sync   /dev/sdb
```

The array mounts but there is no superblock.

At this stage, I did a photorec to try to recover my valuable data
(mainly family photos):
```
$ sudo photorec /log /d ~/k/RAID_REC/ /dev/md0
```
I just recovered a lot of them but others are corrupted because on the
photorec recovering process (sector by sector) it increments the
sector count as time passes but then the counter is 'reset' to a lower
value (my suspicion that the disks are scrambled in the array) and it
recovers some files again (some are equal).

So, my question is: Is there a chance to redo the array correctly
without losing the information inside? Is it possible to recover the
'lost' partition that existed on RAID 1 to be able to do a convenient
backup? Or the only chance is to have a correct disk alignment inside
the array to be able to use photorec to recover the files correctly?

I appreciate your help.
Thanks!

Best,

Jorge