Have you tried to do a resync or repair of the raid? I've written a bit about that here https://wiki.karlsbakk.net/index.php/Roy's_notes#Resync I'd suggest 'repair', since that tends to fix things. PS: If you don't have a backup, make one first. NEVER beleive a raid is backup, please ;) Vennlig hilsen roy -- Roy Sigurd Karlsbakk (+47) 98013356 http://blogg.karlsbakk.net/ GPG Public key: http://karlsbakk.net/roysigurdkarlsbakk.pubkey.txt -- Hið góða skaltu í stein höggva, hið illa í snjó rita. ----- Original Message ----- > From: "Von Fugal" <von@xxxxxxxxx> > To: "Linux Raid" <linux-raid@xxxxxxxxxxxxxxx> > Sent: Monday, 4 July, 2022 19:41:27 > Subject: Re: Help recovering a RAID5, what seems to be a strange state > I did get the array to reassemble. It's still strange to me having all > devices removed, but then listed again. Incremental adds always > resulted in the bad state, but what finally assembled the array was > "mdadm -A --force /dev/md51" started from the array stopped and > without any incremental adds. > > It's still doing recovery but it looks good. I may follow up on this > thread again if it goes south. > > Cheers! > > On Sun, Jul 3, 2022 at 3:57 PM Von Fugal <von@xxxxxxxxx> wrote: >> >> Tl;Dr version: >> I restored partition tables with different end sectors initially. >> Started raids to ill effect. Restored correct partition tables and >> things seemed OK but degraded until they weren't. >> >> Current state is 3 devices with the same event numbers, but the raid >> is "dirty" and cannot start degraded *and* dirty. I know the array >> initially ran with sd[abd]4 and I added the "missing" sdc4 whence it >> did something strange while attempting to resync. >> >> sdc4 is now a "spare" but cannot be added after an attempted >> incremental run with the other 3. Either way, after trying to run the >> array, the table from 'mdadm -D' looks similar to this: >> >> Number Major Minor RaidDevice State >> - 0 0 0 removed >> - 0 0 1 removed >> - 0 0 2 removed >> - 0 0 3 removed >> >> - 8 52 2 sync /dev/sdd4 >> - 8 36 - spare /dev/sdc4 >> - 8 20 0 sync /dev/sdb4 >> - 8 4 1 sync /dev/sda4 >> >> Long story version follows >> >> I have 4 drives partitioned into different raid types. partition 4 is >> a raid5 across all 4 drives. For some reason my gpt partition tables >> were all wiped, and I suspect benchmarking with fio (though I only >> ever gave it an lvm volume to operate on). I boot systemrescuecd and >> testdisk finds the original partitions so I tell it to restore those. >> So far seems good. I start assembling some arrays, others don't work >> yet. lvm is starting to show contents it finds in the so far assembled >> arrays (this is still within systemrescuecd). >> >> Investigating the unassembled arrays, dmesg is complaining about the >> array size changed. I find a suggestion to use "-U devicesize". I >> believe this was my first mistake. The arrays assemble but lvm hangs >> indefinitely at this point. >> >> I desperately search for any info I have on the partitions and arrays >> and I find a spreadsheet on my laptop that contains meticulous >> partition detail. I find that some of the partition ends leave a gap >> before the next partition begins. Whatever. I fix the partition >> tables. This time, all the arrays assemble and lvm is happy!! YES. >> >> However each array has one missing partition member and it's not the >> same disk on each. That's strange. However my server is running. I'm >> able to boot it normally and homeassistant is back up. I then re-add >> each missing partition to each array (I believe this was my second >> mistake). I go to bed while it reconstructs. >> >> In the morning, the array it was reconstructing is back into pending, >> the raid5 array in question is inactive, and it's reconstructing >> something else. I remove each partition that I previously added to >> each array (although the array in question doesn't even let me do >> this) . I stop the array in question and zero the superblock of the >> partition I wanted to remove. I zero the superblocks on each other >> partition removed. I then re-add each partition to each array and let >> them resync. I now have 3 out of 5 fully operational, one more resync >> in progress. >> >> But my array in question is still kinda hosed. Here's where it's >> strange. Rather than explain everything, here's the status from the >> devices (mdadm -E) and the array (mdadm -D). >> https://pastebin.com/Gyj8d7Z7 >> >> Note the table at the end of mdadm -D (end of the paste). It shows >> four devices "removed", then a gap, then 3 devices as 'sync' . If I >> incrementally add the drives it shows a "normal" table. Until I try >> --run, then it shows the odd table. If I add --incremental 3 drives >> (not the 'spare') then run, it shows the pasted table. If I try to add >> the fourth (spare) it says "ADD_NEW_DISK not supported" in dmesg. If I >> add 3 drives including the 'spare' it's the same behavior otherwise, >> but adding the fourth drive complains that it can only add it as a >> spare, and I must use force-spare to add it (I suspect this would be >> my 3rd mistake if I did it). >> >> I think I can force run this array with sd[abd]4 but the normal >> commands give errors when trying to do so. What's also strange is that >> devices sd[abd]4 all have the same event count, yet trying to start >> the array results in "cannot start dirty degraded array". > > > > -- > You keep up the good fight just as long as you feel you need to. > -- Ken Danagger