Re: 4-disk RAID6 (non-standard layout) normalise hung, now all disks spare

Phil Turmel <philip@xxxxxxxxxx> · Fri, 25 Jun 2021 09:59:38 -0400

Good morning Jason,

Good report.  Comments inline.

On 6/25/21 8:08 AM, Jason Flood wrote:
I started with a 4x4TB disk RAID5 array and, over a few years changed all
the drives to 8TB (WD Red - I hadn't seen the warnings before now, but it
looks like these ones are OK). I then successfully migrated it to RAID6, but
it then had a non-standard layout, so I ran:
	sudo mdadm --grow /dev/md0 --raid-devices=4
--backup-file=/root/raid5backup --layout=normalize

Ugh.  You don't have to use a backup file unless mdadm tells you too. 
Now you are stuck with it.

After a few days it reached 99% complete, but then the "hours remaining"
counter started counting up. After a few days I had to power the system down
before I could get a backup of the non-critical data (Couldn't get hold of
enough storage quickly enough, but it wouldn't be catastrophic to lose it),
and now the four drives are in standby, with the array thinking it is RAID0.
Running:
	sudo mdadm --assemble /dev/md0 /dev/sd[bcde]
responds with:
	mdadm: /dev/md0 assembled from 4 drives - not enough to start the
array while not clean - consider --force.

You have to specify the backup file on assembly if a reshape using one 
was interrupted.

It appears to be similar to https://marc.info/?t=155492912100004&r=1&w=2,
but before trying --force I was considering using overlay files as I'm not
sure of the risk of damage. The set-up process that is documented in the "
Recovering a damaged RAID" Wiki article is excellent, however the latter
part of the process isn't clear to me. If successful, are the overlay files
written to the disk like a virtual machine snapshot, or is the process
stopped, the overlays removed and the process repeated, knowing that it now
has a low risk of damage?

Using --force is very low risk on assembly.  I would try it (without 
overlays, and with backup file specified) before you do anything else. 
Odds of success are high.

Also try the flags to treat the backup file as garbage if its contents 
don't match what mdadm expects.

Report back here after the above.

System details follow. Thanks for any help.

[details trimmed]

Your report of the details was excellent.  Thanks for helping us help you.

Phil