RE: 4-disk RAID6 (non-standard layout) normalise hung, now all disks spare

"Jason Flood" <3mu5555@xxxxxxxxx> · Sat, 26 Jun 2021 21:09:29 +1000

Thanks for that, Phil - I think I'm starting to piece it all together now. I was going from a 4-disk RAID5 to 4-disk RAID6, so from my reading the backup file was recommended. The non-standard layout meant that the array had over 20TB usable, but standardising the layout reduced that to 16TB. In that case the reshape starts at the end so the critical section (and so the backup file) may have been in progress at the 99% complete point when it failed, hence the need to specify the backup file for the assemble command.

I ran "sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcde] --backup-file=/root/raid5backup":

mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0.
mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sde is identified as a member of /dev/md0, slot 3.
mdadm: Marking array /dev/md0 as 'clean'
mdadm: /dev/md0 has an active reshape - checking if critical section needs to be restored
mdadm: No backup metadata on /root/raid5backup
mdadm: added /dev/sdc to /dev/md0 as 1
mdadm: added /dev/sdd to /dev/md0 as 2
mdadm: added /dev/sde to /dev/md0 as 3
mdadm: no uptodate device for slot 4 of /dev/md0
mdadm: added /dev/sdb to /dev/md0 as 0
mdadm: Need to backup 3072K of critical section..
mdadm: /dev/md0 has been started with 4 drives (out of 5).

=============================================================
sudo mdadm --detail /dev/md0

/dev/md0:
           Version : 1.2
     Creation Time : Thu Jul 13 01:11:22 2017
        Raid Level : raid6
        Array Size : 15627793408 (14903.83 GiB 16002.86 GB)
     Used Dev Size : 7813896704 (7451.91 GiB 8001.43 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Sat Jun 26 19:40:16 2021
             State : clean, reshaping
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric-6
        Chunk Size : 512K

Consistency Policy : bitmap

    Reshape Status : 99% complete
     Delta Devices : -1, (5->4)
        New Layout : left-symmetric

              Name : Universe:0
              UUID : 3eee8746:8a3bf425:afb9b538:daa61b29
            Events : 184255

    Number   Major   Minor   RaidDevice State
       6       8       16        0      active sync   /dev/sdb
       7       8       32        1      active sync   /dev/sdc
       5       8       48        2      active sync   /dev/sdd
       4       8       64        3      active sync   /dev/sde

=============================================================

cat /proc/mdstat

Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid6 sdb[6] sde[4] sdd[5] sdc[7]
      15627793408 blocks super 1.2 level 6, 512k chunk, algorithm 18 [4/3] [UUUU]
      [===================>.]  reshape = 99.7% (7794393600/7813896704) finish=52211434.6min speed=0K/sec
      bitmap: 14/30 pages [56KB], 131072KB chunk
=============================================================

The drive mounts and the files are all intact, but still sitting on 99% complete with 52 million minutes to finish and counting up. The "No backup metadata" made me suspicious that it is stuck because it can't write to /root/raid5backup (and looking at it now I should have put it somewhere more sensible as I'm using sudo, but I used it in the RAID5 to RAID6 process and it was happy). It does seem to have modified the file, though:

stat raid5backup

  File: raid5backup
  Size: 3149824         Blocks: 6152       IO Block: 4096   regular file
Device: 802h/2050d      Inode: 1572897     Links: 1
Access: (0600/-rw-------)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2021-06-26 19:39:16.739983712 +1000
Modify: 2021-06-26 19:40:16.778498938 +1000
Change: 2021-06-26 19:40:16.778498938 +1000
 Birth: -
=============================================================

But I believe those times are from when I first ran the assemble command - it's 20:30 now. I couldn't find a flag to conditionally treat the backup file as garbage - just the --invalid-backup "I know it's garbage" option. Given that the assemble isn't complaining about needing to restore the critical section, is my next step something like:

	sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcde] --backup-file=raidbackup --invalid-backup

Thanks again, Phil. I haven't been using Linux seriously for very long, so this has been a steep learning curve for me.

Jason
=======================================================================================================================================

-----Original Message-----
From: Phil Turmel <philip@xxxxxxxxxx> 
Sent: Saturday, 26 June 2021 00:00
To: Jason Flood <3mu5555@xxxxxxxxx>; linux-raid@xxxxxxxxxxxxxxx
Subject: Re: 4-disk RAID6 (non-standard layout) normalise hung, now all disks spare

Good morning Jason,

Good report.  Comments inline.

On 6/25/21 8:08 AM, Jason Flood wrote:
> I started with a 4x4TB disk RAID5 array and, over a few years changed 
> all the drives to 8TB (WD Red - I hadn't seen the warnings before now, 
> but it looks like these ones are OK). I then successfully migrated it 
> to RAID6, but it then had a non-standard layout, so I ran:
> 	sudo mdadm --grow /dev/md0 --raid-devices=4 
> --backup-file=/root/raid5backup --layout=normalize

Ugh.  You don't have to use a backup file unless mdadm tells you too. 
Now you are stuck with it.

> After a few days it reached 99% complete, but then the "hours remaining"
> counter started counting up. After a few days I had to power the 
> system down before I could get a backup of the non-critical data 
> (Couldn't get hold of enough storage quickly enough, but it wouldn't 
> be catastrophic to lose it), and now the four drives are in standby, with the array thinking it is RAID0.
> Running:
> 	sudo mdadm --assemble /dev/md0 /dev/sd[bcde] responds with:
> 	mdadm: /dev/md0 assembled from 4 drives - not enough to start the 
> array while not clean - consider --force.

You have to specify the backup file on assembly if a reshape using one was interrupted.

> It appears to be similar to 
> https://marc.info/?t=155492912100004&r=1&w=2,
> but before trying --force I was considering using overlay files as I'm 
> not sure of the risk of damage. The set-up process that is documented in the "
> Recovering a damaged RAID" Wiki article is excellent, however the 
> latter part of the process isn't clear to me. If successful, are the 
> overlay files written to the disk like a virtual machine snapshot, or 
> is the process stopped, the overlays removed and the process repeated, 
> knowing that it now has a low risk of damage?

Using --force is very low risk on assembly.  I would try it (without overlays, and with backup file specified) before you do anything else. 
Odds of success are high.

Also try the flags to treat the backup file as garbage if its contents don't match what mdadm expects.

Report back here after the above.

> System details follow. Thanks for any help.

[details trimmed]

Your report of the details was excellent.  Thanks for helping us help you.

Phil