Re: paranoid md raid1 -> Btrfs migration tools?

Daniel Pocock <daniel@xxxxxxxxxx> · Mon, 28 Sep 2020 15:07:39 +0200

On 28/09/2020 09:31, Dominique Martinet wrote:
> Roberto Ragusa wrote on Mon, Sep 28, 2020:
>>> I could imagine using kpartx to script a solution to (1) above, skipping
>>> over the md headers.  Some kind of shim may be needed to fool the kernel
>>> to see a different UUID for each source volume so they can be mounted
>>> simultaneously without md.
>>
>> The kernel can do it, on a fully operational array.
>>
>> cat /sys/block/md0/md/sync_action
>> echo check > /sys/block/md0/md/sync_action
>> cat /sys/block/md0/md/sync_action
>>
>> then
>>
>> cat /proc/mdstat
>> cat /sys/block/md0/md/mismatch_cnt
> 
> There are two problems to that:
>  - you won't ever know what file or even block was mismatched
> (I've just read that despite check being a 'check', on if it encounters
> a mismatch it will correct either copy? in there just now :
> https://serverfault.com/a/854123 
> But I'm pretty sure that wasn't the case in the past, at least not when
> there is not enough parity to automatically guess a 'correct' answer, so
> I'm not sure about that one)
> There are patches to print the mismatched sector in dmesg e.g. this
> question has one :
> https://unix.stackexchange.com/questions/266432/md-raid5-translate-md-internal-sector-numbers-to-offsets
> But it's still a pain to use and figure which files are impacted
> (disclosure: I wrote that answer)
> 
>  - some raid array vendors don't initially sync the array, on the basis
> that the filesystem should never access data it didn't write first, so
> during the monthly scrub you get zillions of mismatches every single
> time... Just to save a day at start of operation :(
> Obviously won't be a problem for everyone, but this is known to happen.
> 
> 
> 
> So ultimately it's all good if your mismatch_cnt stays at 0 but in case
> of problem you're in for a longer ride.
> 

Here is a high level overview, don't try this at home.  I didn't test
this (yet) so it might need tweaking

1. use rsync to copy from the MD filesystem to the btrfs

2. you can use

      madadm -E /dev/sda1
      madadm -E /dev/sdb1

to find the "Data Offset" within each of the MD disks

3. you can use loopback (losetup) to create mappings to those offsets

     losetup -o OFFSET loop0 /dev/sda1
     losetup -o OFFSET loop1 /dev/sdb1

4. you can mount the loop device but please remember to mount it in Read
Only mode or you might have trouble mounting it as a mirror in future.
Even in Read Only mode you might not be safe, I've heard speculation
that ext4 tweaks some meta data even in read only mode

     mount -o ro /dev/loop0 /mnt/sda1_non_raid

5. you can now use

     rsync --dry-run /mnt/sda1_non_raid /mnt/btrfs_new

to see if every file on the sda1 side of the mirror matches what was
copied to Btrfs

6. repeat step 5 for the sdb1 volume.  As both volumes have the same
ext4 you can't mount them at the same time so you have to unmount sda1
before you mount sdb1

7. if any files don't match, you then need to find a way to work out
which copy is good and which is bad.  If the file format has a checksum
then it might be quite easy to automatically select the good copy.  Not
really sure how to help if the filesystem metadata is corrupt anywhere
but at least this is a start.

It would be interesting to know if this could be scripted and combined
with further checks to ensure a good copy of data is not lost when
decommissioning an md RAID

Regards,

Daniel
_______________________________________________
devel mailing list -- devel@xxxxxxxxxxxxxxxxxxxxxxx
To unsubscribe send an email to devel-leave@xxxxxxxxxxxxxxxxxxxxxxx
Fedora Code of Conduct: https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: https://lists.fedoraproject.org/archives/list/devel@xxxxxxxxxxxxxxxxxxxxxxx