Re: Issues restoring a degraded array

Lane Brooks <lane@xxxxxxxxx> · Tue, 7 Nov 2023 17:50:04 -0700

>From the feedback from HTH, I realized I had never checked the system
logs. In there, I saw:

md: sdi does not have a valid v1.2 superblock, not importing

After googling that, I found a stack overflow answer:
https://unix.stackexchange.com/questions/324313/md-raid5-no-valid-superblock-but-mdadm-examine-says-everything-is-fine

Combining that with the suggestions from Peter, I added
--update=devicesize to the assemble command and my array is back
together... no corruption or loss. It's rebuilding as we speak.
Amazing. Here is the command I ran in case it helps anyone else:
mdadm --assemble --force --verbose --update=devicesize /dev/md0
/dev/sda[abcdefghiml]

And just to be clear, it did not work without the --update=devicesize
as I tried that initially.

Thanks for your help. I'm feeling great now.

Lane

On Tue, Nov 7, 2023 at 2:26 PM Peter Grandi <pg@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> > I have a 14 drive RAID5 array with 1 spare.
>
> Very brave!
>
> > Each drive is a 2TB SSD. One of the drives failed. I replaced
> > it, and while it was rebuilding, one of the original drives
> > experienced some read errors and seems to have been marked
> > bad. I have since cloned that drive (first using dd and the
> > nddrescue), and it clones without any read errors.
>
> So one drive is mostly missing and one drive (the cloned one) is
> behind on event count.
>
> > But now when I run the 'mdadm --assemble --scan' command, I get:
> > mdadm: failed to add /dev/sdi to /dev/md/0: Invalid argument
> > mdadm: /dev/md/0 assembled from 12 drives and 1 spare - not enough to
> > start the array while not clean - consider --force
> > mdadm: No arrays foudn in config file or automatically
>
> The MD RAID wiki has a similar suggestion:
>
>   https://raid.wiki.kernel.org/index.php/Assemble_Run
>
>   "The problem with replacing a dying drive with an incomplete
>   ddrescue copy, is that the raid has no way of knowing which
>   blocks failed to copy, and no way of reconstructing them even
>   if it did. In other words, random blocks will return garbage
>   (probably in the form of a block of nulls) in response to a
>   read request.
>
>   Either way, now forcibly assemble the array using the drives
>   with the highest event count, and the drive that failed most
>   recently, to bring the array up in degraded mode.
>
>     mdadm --force --assemble /dev/mdN /dev/sd[XYZ]1"
>
>
> Note that the suggestion does not use '--scan'.
>
>   "If you are lucky, the missing writes are unimportant. If you
>   are happy with the health of your drives, now add a new drive
>   to restore redundancy.
>
>     mdadm /dev/mdN --add /dev/sdW1
>
>   and do a filesystem check fsck to try and find the inevitable
>   corruption."