Re: Re[13]: Linux Raid + BTRFS: rookie mistake ... dd bs=1M

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Sun, 10 Mar 2019 16:12:53 -0600

On Sat, Mar 9, 2019 at 6:04 PM <no_spam@xxxxxxxxxxxx> wrote:
>
> The file search is still on-going... It's now taken twice as long to
> search for the signature as it took to clone/backup the data.

I'm guessing hexdump is the bottleneck. Read speed can be discovered
with iotop, sar, or iostat. If it's abyssmal, like it's going to take
more than another day, I'd stop it because we can actually get a
template of this 1MiB from /dev/md2 as a sanity check.

The other thing is, the Btrfs super block shows a file system that's
bigger than /dev/md3 - so it's plausible the file being searched for
is on /dev/md2.

> << # dd if=/dev/copyofmd3 skip=17581481438 count=33 2>/dev/null |
> hexdump -C >>
>
> Yielded no useful data. The data is zero.
> I checked the skip number from what you wrote; 4 times.

OK so there's a decent chance the primary GPT discovered at the start
of /dev/md3 is bogus. A GPT on /dev/md3 is also inconsistent with the
/etc/lvm metadata.

> <<Is this from the start of the drive? It definitely suggests some
> proprietary way of assembling the storage stack.>>
>
> Yes, it was from the start of the drive - not partition. IE
> dd if=/dev/sdb count=2048
>

OK so a summary at this point:
- exact mistake is unknown
- subsequent repair attempt made opaque changes masking the above,
therefore both original mistake and repairs are impossible to reverse
- we don't have a complete understanding the storage stack or its present state
- the potential for making things worse with any non-reversible
changes is likely
- Btrfs super blocks discovered so far suggest a 10.6T filesystem with
a single logical device made from an LV
- LVM metadata discovered so far suggest a single LV that's 10.6T
- the LV is made from /dev/md2 and probably /dev/md3 (metadata
references md2 by node, and references md3 by its size)
- /dev/md4 has a Btrfs super with an fs UUID that matches supers found
elsewhere, making /dev/md4's purpose a mystery
- the /dev/md3 backup on the 10T drive is a ~8T portion of the LV, and
is non-viable by itself

"Sparrow" is now my nickname for the md member number 2 drive from bay
3 that was "missing" at the time of the accident. Backups of the first
1MiB from LBA 0, and the first 1MiB from partition 6, from "sparrow" -
and I can't think of any reason to keep this drive set aside.

Best practice proposal:
1. Move all four NAS drives to the test PC;
2. Boot LiveOS with parameters `raid=noautodetect rd.lvm=0 rd.md=0`
## These two steps make sure no RAID auto assemble happens, and LVM
isn't activated. Mainly this is a safety step.
3. Follow this
https://raid.wiki.kernel.org/index.php/Recovering_a_failed_software_RAID#Making_the_harddisks_read-only_using_an_overlay_file

What's the catch? Well, it's a bit tedious to set that up. And also if
a resync happens (likely), the overlay file for "sparrow" might get
big. If there isn't a write intent bitmap for /dev/md3, it's likely
the sync will need nearly 4TB of space. So the overlay method may be
unworkable, unless the 10T drive is repurposed for this task.

Alternative proposal:

>From the available metadata, the arrays are all clean. Keep them in
the NAS, startup with all four drives together, allow resync to
happen, and check `cat /proc/mdstat` to make sure it's proceeding
normally and completes. It doesn't need to finish syncing to move on
with the next step but it does need to be in normal operation. There
is an obvious risk proceeding with restoring LVM metadata to /dev/md3
from /etc/lvm backup, without using the overlay method described in
step 3 above.

Pick either best practice, or alternative above. And then non-detail
version of the next steps are as follows:

- Check /dev/md2 metadata, in theory it's a decent template for
/dev/md3 since both are LVM members of the same VG.
# dd if=/dev/md2 count=2048 2>/dev/null | hexdump -C

- Restore LVM metadata on /dev/md3 - we'll need to move to the LVM
list for this for advice and sanity checking.

- If the LVM LV is successfully restored, we'll then need to do a
non-invasive (i.e. do not repair) Btrfs super block verification,
before attempting to mount.

-- 
Chris Murphy