Good morning Jason, Wol,
On 6/26/21 9:13 AM, antlists wrote:
On 26/06/2021 12:09, Jason Flood wrote:
Reshape Status : 99% complete
Delta Devices : -1, (5->4)
New Layout : left-symmetric
Name : Universe:0
UUID : 3eee8746:8a3bf425:afb9b538:daa61b29
Events : 184255
Number Major Minor RaidDevice State
6 8 16 0 active sync /dev/sdb
7 8 32 1 active sync /dev/sdc
5 8 48 2 active sync /dev/sdd
4 8 64 3 active sync /dev/sde
Phil will know much more about this than me, but I did notice that the
system thinks there should be FIVE raid drives. Is that an mdadm bug?
Not a bug, but a reshape from a degraded array with a reduction in space.
That would explain the failure to assemble - it thinks there's a drive
missing. And while I don't think we've had data-eating trouble,
reshaping a parity raid has caused quite a lot of grief for people over
the years ...
I've never tried it starting from a degraded array. Might be a corner
case bug not yet exposed.
However, you're running a recent Ubuntu and mdadm - that should all have
been fixed by now.
Indeed.
Cheers,
Wol
On 6/26/21 7:09 AM, Jason Flood wrote:
> Thanks for that, Phil - I think I'm starting to piece it all together
now. I was going from a 4-disk RAID5 to 4-disk RAID6, so from my reading
the backup file was recommended. The non-standard layout meant that the
array had over 20TB usable, but standardising the layout reduced that to
16TB. In that case the reshape starts at the end so the critical section
(and so the backup file) may have been in progress at the 99% complete
point when it failed, hence the need to specify the backup file for the
assemble command.
>
> I ran "sudo mdadm --assemble --verbose --force /dev/md0 /dev/sd[bcde]
--backup-file=/root/raid5backup":
>
> mdadm: looking for devices for /dev/md0
> mdadm: /dev/sdb is identified as a member of /dev/md0, slot 0.
> mdadm: /dev/sdc is identified as a member of /dev/md0, slot 1.
> mdadm: /dev/sdd is identified as a member of /dev/md0, slot 2.
> mdadm: /dev/sde is identified as a member of /dev/md0, slot 3.
> mdadm: Marking array /dev/md0 as 'clean'
> mdadm: /dev/md0 has an active reshape - checking if critical section
needs to be restored
> mdadm: No backup metadata on /root/raid5backup
> mdadm: added /dev/sdc to /dev/md0 as 1
> mdadm: added /dev/sdd to /dev/md0 as 2
> mdadm: added /dev/sde to /dev/md0 as 3
> mdadm: no uptodate device for slot 4 of /dev/md0
> mdadm: added /dev/sdb to /dev/md0 as 0
> mdadm: Need to backup 3072K of critical section..
> mdadm: /dev/md0 has been started with 4 drives (out of 5).
>
So force was sufficient to assemble. But you are still stuck at 99%.
Look at the output of ps to see if mdmon is still running (that is the
background process that actually reshapes stripe by stripe). If not,
look in your logs for clues as to why it died.
If you can't find anything significant, the next step would be to backup
the currently functioning array to another system/drive collection and
start from scratch. I wouldn't trust anything else with the information
available.
Phil
ps. Convention on kernel.org mailing lists is to NOT top-post, and to
trim unnecessary context.