Help recovering a RAID5, what seems to be a strange state

Von Fugal <von@xxxxxxxxx> · Sun, 3 Jul 2022 15:57:31 -0600

Tl;Dr version:
I restored partition tables with different end sectors initially.
Started raids to ill effect. Restored correct partition tables and
things seemed OK but degraded until they weren't.

Current state is 3 devices with the same event numbers, but the raid
is "dirty" and cannot start degraded *and* dirty. I know the array
initially ran with sd[abd]4 and I added the "missing" sdc4 whence it
did something strange while attempting to resync.

sdc4 is now a "spare" but cannot be added after an attempted
incremental run with the other 3. Either way, after trying to run the
array, the table from 'mdadm -D' looks similar to this:

    Number   Major   Minor   RaidDevice State
       -       0        0        0      removed
       -       0        0        1      removed
       -       0        0        2      removed
       -       0        0        3      removed

       -       8       52        2      sync   /dev/sdd4
       -       8       36        -      spare   /dev/sdc4
       -       8       20        0      sync   /dev/sdb4
       -       8        4        1      sync   /dev/sda4

Long story version follows

I have 4 drives partitioned into different raid types. partition 4 is
a raid5 across all 4 drives. For some reason my gpt partition tables
were all wiped, and I suspect benchmarking with fio (though I only
ever gave it an lvm volume to operate on). I boot systemrescuecd and
testdisk finds the original partitions so I tell it to restore those.
So far seems good. I start assembling some arrays, others don't work
yet. lvm is starting to show contents it finds in the so far assembled
arrays (this is still within systemrescuecd).

Investigating the unassembled arrays, dmesg is complaining about the
array size changed. I find a suggestion to use "-U devicesize". I
believe this was my first mistake. The arrays assemble but lvm hangs
indefinitely at this point.

I desperately search for any info I have on the partitions and arrays
and I find a spreadsheet on my laptop that contains meticulous
partition detail. I find that some of the partition ends leave a gap
before the next partition begins. Whatever. I fix the partition
tables. This time, all the arrays assemble and lvm is happy!! YES.

However each array has one missing partition member and it's not the
same disk on each. That's strange. However my server is running. I'm
able to boot it normally and homeassistant is back up. I then re-add
each missing partition to each array (I believe this was my second
mistake). I go to bed while it reconstructs.

In the morning, the array it was reconstructing is back into pending,
the raid5 array in question is inactive, and it's reconstructing
something else. I remove each partition that I previously added to
each array (although the array in question doesn't even let me do
this) . I stop the array in question and zero the superblock of the
partition I wanted to remove. I zero the superblocks on each other
partition removed. I then re-add each partition to each array and let
them resync. I now have 3 out of 5 fully operational, one more resync
in progress.

But my array in question is still kinda hosed. Here's where it's
strange. Rather than explain everything, here's the status from the
devices (mdadm -E) and the array (mdadm -D).
https://pastebin.com/Gyj8d7Z7

Note the table at the end of mdadm -D (end of the paste). It shows
four devices "removed", then a gap, then 3 devices as 'sync' . If I
incrementally add the drives it shows a "normal" table. Until I try
--run, then it shows the odd table. If I add --incremental 3 drives
(not the 'spare') then run, it shows the pasted table. If I try to add
the fourth (spare) it says "ADD_NEW_DISK not supported" in dmesg. If I
add 3 drives including the 'spare' it's the same behavior otherwise,
but adding the fourth drive complains that it can only add it as a
spare, and I must use force-spare to add it (I suspect this would be
my 3rd mistake if I did it).

I think I can force run this array with sd[abd]4 but the normal
commands give errors when trying to do so. What's also strange is that
devices sd[abd]4 all have the same event count, yet trying to start
the array results in "cannot start dirty degraded array".