On Sun, Jan 17 2021, Nathan Brown wrote: > It was this part of the original post > >> The raid didn't automatically assemble so I did >> `mdadm --assemble` but really screwed up and put the 5 new disks in a >> different array > > Basically, I did an `mdadm --assemble /dev/md1 <new disks for md0>` That command wouldn't have the effect you describe (and is visible in the --examine output - thanks). Maybe you mean "--add" ??? > instead of `mdadm --assemble /dev/md0 <new disks for md0>`. Further > complicated by the fact the md1 was missing a disk, so I let 1 of the > 5 disks become a full member md1 since I didn't catch my error in time > and enough recovery on md1 had occurred to wipe out any data transfer > from the reshape on md0. The other 4 became hot spares. This wiped the > super block on those 5 new disks, the super blocks no longer contain > the correct information showing the original reshape attempt on md0. > > I have yet to dive into the code but it seems likely that I can > manually reconstruct the appropriate super blocks for these 4 disks > that still contain valid data as a result of the reshape with a worst > case of ~1/5th data loss. There will be fs-metadata loss as well as data loss, and that is the real killer. Yes the data is probably still on those "spare" devices. Probably just the md-metadata is lost. The data that was on sdo1 is now lost, but RAID6 protects you from losing one device, so that doesn't matter. To reconstruct the correct metadata, the easiest approach is probably to copy the superblock from the best drive in md0 and use a binary-editor to change the 'Device Role' field to an appropriate number for each different device. Maybe your kernel logs will have enough info to confirm which device was in each role. One approach to copying the metadata is to use "mdadm --dump=/tmp/md0 /dev/md0" which should create sparse files in /tmp/md0 with the metadata from each device. Then binary-edit those files, and rename them. Then use mdadm --restore=/tmp/md0 /dev/md0 to copy the metadata back. Maybe. Then use "mdadm --examine --super=1.2" to check that the superblock looks OK and to find out what the "expected" checksum is. Then edit the superblock again to set the checksum. Then try assembling the array with mdadm --assemble --freeze-reshape --readonly .... which should minimize the damage that can be done if something isn't right. Then try "fsck -n" the filesystem to see if it looks OK. Good luck NeilBrown
Attachment:
signature.asc
Description: PGP signature