Thanks for that, Phil - I think I'm starting to piece it all together now. I was going from a 4-disk RAID5 to 4-disk RAID6, so from my reading the backup file was recommended. The non-standard layout meant that the array had over 20TB usable, but standardising the layout reduced that to 16TB. In that case the reshape starts at the end so the critical section (and so the backup file) may have been in progress at the 99% complete point when it failed, hence the need to specify the backup file for the assemble command. I ran "sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcde] --backup-file=/root/raid5backup": mdadm: looking for devices for /dev/md0 mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0. mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2. mdadm: /dev/sde is identified as a member of /dev/md0, slot 3. mdadm: Marking array /dev/md0 as 'clean' mdadm: /dev/md0 has an active reshape - checking if critical section needs to be restored mdadm: No backup metadata on /root/raid5backup mdadm: added /dev/sdc to /dev/md0 as 1 mdadm: added /dev/sdd to /dev/md0 as 2 mdadm: added /dev/sde to /dev/md0 as 3 mdadm: no uptodate device for slot 4 of /dev/md0 mdadm: added /dev/sdb to /dev/md0 as 0 mdadm: Need to backup 3072K of critical section.. mdadm: /dev/md0 has been started with 4 drives (out of 5). ============================================================= sudo mdadm --detail /dev/md0 /dev/md0: Version : 1.2 Creation Time : Thu Jul 13 01:11:22 2017 Raid Level : raid6 Array Size : 15627793408 (14903.83 GiB 16002.86 GB) Used Dev Size : 7813896704 (7451.91 GiB 8001.43 GB) Raid Devices : 4 Total Devices : 4 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Sat Jun 26 19:40:16 2021 State : clean, reshaping Active Devices : 4 Working Devices : 4 Failed Devices : 0 Spare Devices : 0 Layout : left-symmetric-6 Chunk Size : 512K Consistency Policy : bitmap Reshape Status : 99% complete Delta Devices : -1, (5->4) New Layout : left-symmetric Name : Universe:0 UUID : 3eee8746:8a3bf425:afb9b538:daa61b29 Events : 184255 Number Major Minor RaidDevice State 6 8 16 0 active sync /dev/sdb 7 8 32 1 active sync /dev/sdc 5 8 48 2 active sync /dev/sdd 4 8 64 3 active sync /dev/sde ============================================================= cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md0 : active raid6 sdb[6] sde[4] sdd[5] sdc[7] 15627793408 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUUU] [===================>.] reshape = 99.7% (7794393600/7813896704) finish=52211434.6min speed=0K/sec bitmap: 14/30 pages [56KB], 131072KB chunk ============================================================= The drive mounts and the files are all intact, but still sitting on 99% complete with 52 million minutes to finish and counting up. The "No backup metadata" made me suspicious that it is stuck because it can't write to /root/raid5backup (and looking at it now I should have put it somewhere more sensible as I'm using sudo, but I used it in the RAID5 to RAID6 process and it was happy). It does seem to have modified the file, though: stat raid5backup File: raid5backup Size: 3149824 Blocks: 6152 IO Block: 4096 regular file Device: 802h/2050d Inode: 1572897 Links: 1 Access: (0600/-rw-------) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2021-06-26 19:39:16.739983712 +1000 Modify: 2021-06-26 19:40:16.778498938 +1000 Change: 2021-06-26 19:40:16.778498938 +1000 Birth: - ============================================================= But I believe those times are from when I first ran the assemble command - it's 20:30 now. I couldn't find a flag to conditionally treat the backup file as garbage - just the --invalid-backup "I know it's garbage" option. Given that the assemble isn't complaining about needing to restore the critical section, is my next step something like: sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcde] --backup-file=raidbackup --invalid-backup Thanks again, Phil. I haven't been using Linux seriously for very long, so this has been a steep learning curve for me. Jason ======================================================================================================================================= -----Original Message----- From: Phil Turmel <philip@xxxxxxxxxx> Sent: Saturday, 26 June 2021 00:00 To: Jason Flood <3mu5555@xxxxxxxxx>; linux-raid@xxxxxxxxxxxxxxx Subject: Re: 4-disk RAID6 (non-standard layout) normalise hung, now all disks spare Good morning Jason, Good report. Comments inline. On 6/25/21 8:08 AM, Jason Flood wrote: > I started with a 4x4TB disk RAID5 array and, over a few years changed > all the drives to 8TB (WD Red - I hadn't seen the warnings before now, > but it looks like these ones are OK). I then successfully migrated it > to RAID6, but it then had a non-standard layout, so I ran: > sudo mdadm --grow /dev/md0 --raid-devices=4 > --backup-file=/root/raid5backup --layout=normalize Ugh. You don't have to use a backup file unless mdadm tells you too. Now you are stuck with it. > After a few days it reached 99% complete, but then the "hours remaining" > counter started counting up. After a few days I had to power the > system down before I could get a backup of the non-critical data > (Couldn't get hold of enough storage quickly enough, but it wouldn't > be catastrophic to lose it), and now the four drives are in standby, with the array thinking it is RAID0. > Running: > sudo mdadm --assemble /dev/md0 /dev/sd[bcde] responds with: > mdadm: /dev/md0 assembled from 4 drives - not enough to start the > array while not clean - consider --force. You have to specify the backup file on assembly if a reshape using one was interrupted. > It appears to be similar to > https://marc.info/?t=155492912100004&r=1&w=2, > but before trying --force I was considering using overlay files as I'm > not sure of the risk of damage. The set-up process that is documented in the " > Recovering a damaged RAID" Wiki article is excellent, however the > latter part of the process isn't clear to me. If successful, are the > overlay files written to the disk like a virtual machine snapshot, or > is the process stopped, the overlays removed and the process repeated, > knowing that it now has a low risk of damage? Using --force is very low risk on assembly. I would try it (without overlays, and with backup file specified) before you do anything else. Odds of success are high. Also try the flags to treat the backup file as garbage if its contents don't match what mdadm expects. Report back here after the above. > System details follow. Thanks for any help. [details trimmed] Your report of the details was excellent. Thanks for helping us help you. Phil