Re: 4-disk RAID6 (non-standard layout) normalise hung, now all disks spare

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Good morning Jason, Wol,

On 6/26/21 9:13 AM, antlists wrote:
On 26/06/2021 12:09, Jason Flood wrote:
     Reshape Status : 99% complete
      Delta Devices : -1, (5->4)
         New Layout : left-symmetric

               Name : Universe:0
               UUID : 3eee8746:8a3bf425:afb9b538:daa61b29
             Events : 184255

     Number   Major   Minor   RaidDevice State
        6       8       16        0      active sync   /dev/sdb
        7       8       32        1      active sync   /dev/sdc
        5       8       48        2      active sync   /dev/sdd
        4       8       64        3      active sync   /dev/sde

Phil will know much more about this than me, but I did notice that the system thinks there should be FIVE raid drives. Is that an mdadm bug?

Not a bug, but a reshape from a degraded array with a reduction in space.

That would explain the failure to assemble - it thinks there's a drive missing. And while I don't think we've had data-eating trouble, reshaping a parity raid has caused quite a lot of grief for people over the years ...

I've never tried it starting from a degraded array. Might be a corner case bug not yet exposed.

However, you're running a recent Ubuntu and mdadm - that should all have been fixed by now.

Indeed.

Cheers,
Wol

On 6/26/21 7:09 AM, Jason Flood wrote:
> Thanks for that, Phil - I think I'm starting to piece it all together now. I was going from a 4-disk RAID5 to 4-disk RAID6, so from my reading the backup file was recommended. The non-standard layout meant that the array had over 20TB usable, but standardising the layout reduced that to 16TB. In that case the reshape starts at the end so the critical section (and so the backup file) may have been in progress at the 99% complete point when it failed, hence the need to specify the backup file for the assemble command.
>
> I ran "sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcde] --backup-file=/root/raid5backup":
>
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 3.
> mdadm: Marking array /dev/md0 as 'clean'
> mdadm: /dev/md0 has an active reshape - checking if critical section needs to be restored
> mdadm: No backup metadata on /root/raid5backup
> mdadm: added /dev/sdc to /dev/md0 as 1
> mdadm: added /dev/sdd to /dev/md0 as 2
> mdadm: added /dev/sde to /dev/md0 as 3
> mdadm: no uptodate device for slot 4 of /dev/md0
> mdadm: added /dev/sdb to /dev/md0 as 0
> mdadm: Need to backup 3072K of critical section..
> mdadm: /dev/md0 has been started with 4 drives (out of 5).
>

So force was sufficient to assemble.  But you are still stuck at 99%.

Look at the output of ps to see if mdmon is still running (that is the background process that actually reshapes stripe by stripe). If not, look in your logs for clues as to why it died.

If you can't find anything significant, the next step would be to backup the currently functioning array to another system/drive collection and start from scratch. I wouldn't trust anything else with the information available.

Phil

ps. Convention on kernel.org mailing lists is to NOT top-post, and to trim unnecessary context.



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux