Re: Re[6]: Linux Raid + BTRFS: rookie mistake ... dd bs=1M

Chris Murphy <lists@xxxxxxxxxxxxxxxxx> · Thu, 7 Mar 2019 22:59:34 -0700

On Thu, Mar 7, 2019 at 9:33 PM <no_spam@xxxxxxxxxxxx> wrote:

>
>> # parted /dev/md3 u s p
>
> Model: Linux Software RAID Array (md)
> Disk /dev/md3: 17581481472s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> Disk Flags:
>
> Number  Start  End  Size  File system  Name  Flags

Did this paste incorrectly? So there's an empty GPT at the end of
/dev/md3 - WTF?

OK let's just skip that for now.

> <<And also the same for /dev/sda (without any numbers).>>
>
> Model: WDC WD4002FFWX-68TZ4 (scsi)
> Disk /dev/sda: 7814037168s
> Sector size (logical/physical): 512B/512B
> Partition Table: gpt
> Disk Flags:
>
> Number  Start        End          Size         File system     Name  Flags
>   1      2048s        4982527s     4980480s     ext4                  raid
>   2      4982528s     9176831s     4194304s     linux-swap(v1)        raid
>   5      9453280s     1953318239s  1943864960s                        raid
>   6      1953334336s  7813830239s  5860495904s                        raid

OK so what do you get for

# mdadm -E /dev/sda1

That could be your root fs with mdadm metadata v1.0 or 0.9 which is
why it shows up as ext4 and also a raid flag.

> This data was present on the NAS w/ drives installed.
> I have the contents of /etc/lvm zipped up... but didn't not attach it
> here because of two concerns:
> 1) the List will proably strip it. (smart)
> 2) Concerned the data may contain some sensitive data. Unlikely; but
> wanted to make sure before I broadcasted it to everyone on the 'net.

You don't have to post this metadata anywhere, yet anyway, but those
files are plain text so you can look in them for sensitive
information. They will contain device information like size and node,
a random UUID which is derived from /dev/urandom not anything
identifiable on that system, and probably that system's hostname; this
is all information that's in the mdadm -E and -D results you've
already posted. Not anything more sensitive than that. But yeah look
at the contents to be sure.

>
>
> <<a.) figure out how this thing assembles itself at boot time in order
> to reveal the root to get at /etc/lvm; or b.) put the three drives in
> the NAS and boot it. a) is tedious without a cheat sheet from
> Synology. >>
> Weirdly; there is something black-science-y going on with the way
> Synology sets up these systems. Upon putting the drives in the NAS...
> I got a /dev/md0 which becomes the root. Wonder why it didn't show up
> when on the test PC. I specutively executed and dumped some data for
> you at the end of this email thread.

It could be metadata 0.9 which is kernel autodetect and maybe that's
not going to automatically start up unless it's a boot volume, I'm not
that much of an mdadm expert like others on this list so... that
itself doesn't really surprise me or seem like dark art. The goofy
partitioning is what's got my goat at the moment, it's awfully
obfuscating which is super irritating. But whatever.

> <<Put the three NAS drives that are in the PC back into the NAS and boot
> (degraded), and collect the information we really want:
> # blkid>>
>
> This returned no data. I suspect blkid is blocked or not "completely"
> implemented on the Synology "os".

The usual reason why blkid does not return with any information is
because the user you're logged in as is not root. You might need to
do:

$ sudo blkid

And then type in admin credentials. Or alternatively you can do:

$ sudo -i

And admin credentials, which makes you "like root" and you'll see a

#

Which indicates you are effectively root user, and now you don't need
`sudo` with blkid.

>
> <<# mount>>
> Trimmed out the extraneous info.
>
> /dev/md0 on / type ext4 (rw,relatime,journal_checksum,barrier,data=ordered)

Yeah bingo. So that's the real root file system. OK so we'd have had
to manually assemble that first partition, to spin up /dev/md0, then
mount it, in order to get access to /etc/lvm - from the test PC. But
you've put it in the NAS and you have /etc/lvm so it's fine.

While you have the chance:

$ sudo tar -acf /tmp/lvm.tar.gz /etc/lvm

Copy /tmp/lvm.tar.gz to a usb stick; or alternately you can scp it off
the NAS or however else you're comfortable. e.g.

$ scp /tmp/lvm.tar.gz username@laptopIP:~/

That file will get deleted at the next reboot so you can just leave it
alone or delete it if you want. But you want it off the system so you
can inspect it offline, and you will probably need to share a portion
of it with the LVM list if we get to that point.

> <<# grep -r md3 /etc/lvm>>
> /etc/lvm/archive/vg1000_00000-2024839799.vg:description = "Created
> *before* executing '/sbin/vgcreate --physicalextentsize 4m /dev/vg1000
> /dev/md2 /dev/md3'"
> /etc/lvm/archive/vg1000_00000-2024839799.vg:                    device = "/dev/md3"     # Hint only
> /etc/lvm/archive/vg1000_00003-229433250.vg:                     device = "/dev/md3"     # Hint only
> /etc/lvm/archive/vg1000_00004-577325499.vg:                     device = "/dev/md3"     # Hint only
> /etc/lvm/archive/vg1000_00002-1423835597.vg:description = "Created
> *before* executing '/sbin/pvresize /dev/md3'"
> /etc/lvm/archive/vg1000_00002-1423835597.vg:                    device = "/dev/md3"     # Hint only
> /etc/lvm/archive/vg1000_00001-537833588.vg:                     device = "/dev/md3"     # Hint only
> /etc/lvm/backup/vg1000:                 device = "/dev/md3"     # Hint only

OK so for sure /dev/md3 is an LVM member, which we suspected but
that's proof. So OK now we probably have something in the first 1MiB
to look for: the lvm2 "magic", although tedious to search for this
magic on 8TB of data. If that signature is in the middle of nowhere on
/dev/md3, that's virtually certain to be your 1MiB backup file of the
first 1MiB of /dev/md3. In which case it's just a matter of getting
the alignements right, sanity checking it and writing over the zero'd
1MiB at the start of /dev/md3

The worst that happens, is, it's wrong and LVM doesn't activate on /dev/md3.

So now you just need a hint for how to search for the LVM2 magic on
either /dev/md3 from the NAS; or alternatively you can search for it
on the 10TB drive on which you backed up /dev/md3,  on the test PC.

How did you backup /dev/md3 to that 10T drive by the way? What was the command?

>
>
> <<# cat /etc/fstab>>
>
> none /proc proc defaults 0 0
> /dev/root / ext4 defaults 1 1
> /dev/vg1000/lv /volume1 btrfs  0 0

OK so I'm gonna guess that /dev/root is a label symlink to /dev/md0
you could do an 'ls -l /dev/` and look through that whole list for
root and see if it points to /dev/md0.

OH OK and super, the LVM LV is in fact Btrfs. So we might have a
chance there's nothing special in the first 1MiB that's been zero'd
other than the Btrfs main super block. That's relatively easy to
reconstruct. So we have two possible ways of fixing this thing. For
sure if we can find the 1MiB file and restore it, that's the most
reliable; but it's tedious to find it time wise, and I don't have a
command in mind off hand.

The LVM2 magic is defined as:

offset 0x00000218 (LVM2_member): 4c 56 4d 32 20 30 30 31

So basically we need a command that will search for that. Something like

$ sudo dd if=($searchdev) 2>/dev/null | hexdump -C | grep -P
"\x4c\x56\x4d\x32\x20\x30\x30\x31"

($searchdev) might be /dev/md3 on the NAS; or maybe /dev/sdX if it's
the 10T backup on your test PC, might be slightly faster on the 10T
because no RAID parity reconstruction is needed.

Hello? Anyone got a better idea? Obviously I want to get the offset
from the device being searched, not just a validation that the search
string is found, hence the hexdump command but, dumping the entire
drive through hexdump C and then filtering? *shrug* That seems
inefficient and figure someone else has got to have a better idea. But
I just did a quick test and it does work.

John this is a safe command, read only, but it might have to go
through 8TB to find what we're after. And you'll have to save the
entire output from it (copy paste to a text file is fine). Those
offsets are where we'll have to do yet another search+extract of the
1MiB we want, and sanity check it. But if my command is wrong, it's
8TB searched for nothing so maybe wait and see if anyone chimes in.
FWIW, this list is often dead on the weekend :D So after tomorrow
afternoon, probably radio silence. Good time to actually do the search
though.

>
>
> <<Part 2:
> Put the "missing" md member number 2/bay 3 drive into the PC, booting
> from Live media as you have been.
>
> # mdadm -E /dev/sdX6 >>
>
> /dev/sda6:
>            Magic : a92b4efc
>          Version : 1.2
>      Feature Map : 0x0
>       Array UUID : 340a678e:167ca3d9:c185d6c8:a1d66183
>             Name : Zittware-NAS916:3
>    Creation Time : Thu May 25 01:26:52 2017
>       Raid Level : raid5
>     Raid Devices : 4
>
>   Avail Dev Size : 5860493856 (2794.50 GiB 3000.57 GB)
>       Array Size : 8790740736 (8383.50 GiB 9001.72 GB)
>    Used Dev Size : 5860493824 (2794.50 GiB 3000.57 GB)
>      Data Offset : 2048 sectors
>     Super Offset : 8 sectors

OK so that's as expected. Data begins 1MiB into /dev/sda6. Command to
read a MB of that

$ sudo dd if=/dev/sda6 skip=2048 count=2048 of=/tmp/sda6missing1M.bin

And now you can copy that /tmp/ .bin file to USB stick or scp it off
onto some other computer. To look at the contents:

$ sudo dd if=/tmp/sda6missing1M.bin 2>/dev/null | hexdump -C

I can't really guess what's in it, because it's 64K fragments of 1/4
of the data we want, and 1/4 of that is parity garbage that doesn't
mean anything without two other fragements. So - not a lot to go on
but might contain a signature...

Anyway you can set this aside because whatever that can tell us is
both tedious and a long shot. But at least you have it.

>     Unused Space : before=1968 sectors, after=32 sectors
>            State : clean
>      Device UUID : 62201ad0:0158f31a:ac35b379:7f13a583
>
>      Update Time : Sat Mar  2 01:09:20 2019
>         Checksum : 348b1754 - correct
>           Events : 16134

This has the same even count as the other drives though. If this drive
was pulled before you did the accidental wiping, this drive if it's
the correct drive should have a lower event count than the other three
drives. Something is confused now...

I'm gonna ignore it for now and move on though.

>
>           Layout : left-symmetric
>       Chunk Size : 64K
>
>     Device Role : Active device 2
>     Array State : AAAA ('A' == active, '.' == missing, 'R' == replacing)
>
> << # dd if=/dev/sdX6 skip=2048 bs=1M count=1 of=/safepathtofile-sdX6. >>

Yeah sorry wrong command, use the one above. The mistake with this
command is "skip=2048" is predicated on the block size, but block size
is "bs=1M" so this actually skipped 2048 MB into the drive. And what
we need is only 1MB into the drive. You can toss this file. I haven't
downloaded it anyway. When bs= is not defined, it defaults to 512
bytes. And 2048 512-byte sectors is 1MiB.

Common mistake, anyway. Another one is confusing skip and seek. skip
is for input, seek is for output. And that can be another source of
data loss when writing out with a skip, when it should have been a
seek or vice versa. So yeah...

> Other data I collected...
> cat /proc/mdstat showed some real interesting data on the NAS.
> Basically 4 /dev/mdX volumes. I logged the output of each /dev/mdX
> volume using mdadm -E /dev/md[0123] iirc.
>
> /dev/md0:
>          Version : 0.90
>    Creation Time : Wed May 24 20:12:04 2017
>       Raid Level : raid1
>       Array Size : 2490176 (2.37 GiB 2.55 GB)
>    Used Dev Size : 2490176 (2.37 GiB 2.55 GB)
>     Raid Devices : 4

OK bingo. Nice. As expected this is the real root file system. Not a
surprise. And nice to know but we have everything else we need at this
point.

-- 
Chris Murphy