Personal insight - Re: Can't mount partitions after "mdadm --zero-superblock"

Eduard Rozenberg <eduardr@xxxxxxxxx> · Thu, 9 Apr 2015 19:53:36 -0700

I finally figured out why I panicked and why things eventually worked 
thanks to this very helpful post:

http://unix.stackexchange.com/questions/64889/how-to-mount-recover-data-on-a-disk-that-was-part-of-a-mdadm-raid-1-on-another-m

In particular this paragraph:

"Linux mdraid has several metadata formats. Formats 0.9 and 1.0 put the metadata
at the end of the containing device, and the payload (the filesystem) starts at
the beginning of the device and can be accessed directly without going through
the raid layer. Formats 1.1 and 1.2 put the metadata at the middle and
beginning of the containing device respectively, so the payload is at an offset."

I knew about the various metadata versions but I’d been working under the false 
assumption (based on my 0.9/1.0 metadata format mdadm experience) that
I would be able to directly mount an ext4 partition that had been previously part of
a RAID1 mdadm device.

According to the paragraph above this is clearly no longer true with metadata
v1.1/1.2, and mounting a previously raid1 partition requires either the decimal
offset to mount it directly or creating a new md device with the partition in question
(without of course formatting the new md device). Certainly not as convenient
as it used to be but I'm sure there were good reasons to change the location of
the metadata to the front of each partition.

Perhaps the mdadm wiki could include information to this effect, that mounting
formerly raid1 mdadm partitions with metadata 1.1/1.2 requires these extra
steps and not to panic :).

Regards,
—Ed

> On Apr 6, 2015, at 20:31, Eduard Rozenberg <eduardr@xxxxxxxxx> wrote:
> 
> Hello Neil,
> 
> Success! This is stronger voodoo magic than I’ve ever 
> had to perform so really didn’t have the faith to continue
> without the extra encouragement :). My initial mistake
> had been to use the octal value with losetup instead of
> getting the decimal value.
> 
> Documenting my steps below for anybody else who
> might come here later.
> 
> In the examples below, we are using partition "/dev/sdac2"
> Replace this with the appropriate partition you’re recovering.
> 
> Step 1: find the decimal value for the start of the partition
> ----------------------------------------------------------------------------
> 
> Note:
> ext4 partitions have a “magic" octal value of “ef53"
> to indicate the start of the partition. Note that "ef53"
> may show more than once as you read further into
> the partition. We are interested in the location the 
> very first occurrence of "ef53". Other types of
> partitions (ext2, ext3, etc) probably have other magic
> values to look for so this “ef53" may not apply there.
> 
> The "od" hex viewer command to search for “ef53":
> 
> 	od -x /dev/sdac2 | awk '$6 == "ef53"'
> 
> The results will look something like:
> 
> 4002060 f3fd 5521 0004 0025 ef53 0001 0001 0000
> 1004000060 64be 4ec9 0000 0025 ef53 0000 0001 0000
> 1042630400 17f8 a7dd bb6e ee40 ef53 000d 3cfb 9e22
> 
> We are only interested in the first line. So we now have
> the octal address of the ef53 magic value: it’s the first
> long number on the line: "4002060" (octal value!)
> 
> According to Neil’s instructions then we have to subtract
> the octal value “0002060" from this number we found.
> We then have to convert the octal result into decimal.
> 
> Luckily an online calculator makes this easy:
> 
> http://www.csgnetwork.com/octaddsubcalc.html
> 
> "Enter a octal value" - Enter “4002060" here
> "Enter Second Octal Value" - Enter “0002060" here
> 
> Then take the value from the line:
> "Calculated Decimal Subtraction" - 1048576
> 
> This is the decimal value for the start of our partition.
> 
> 
> Step 2: use the decimal start value to mount partition
> ———————————————————————————————————
> 
> First create a loop device loop0 as a handle to the
> partition. We tell losetup where the start of the partition is:
> 
> 	losetup -o 1048576 /dev/loop0 /dev/sdac2
> 
> Next, try to mount loop0 read-only (hopefully it will work!)
> 
> 	mount -o ro /dev/loop0 /mnt
> 
> If the partition is unclean and needs to be fscked:
> 
> 	fsck.ext4 /dev/loop0
> 
> 
> Thanks again Neil! Maybe a few years from now I’ll
> understand why this worked when nothing else did
> ( linux tools still have some ways to go to being
> intelligent enough to do this kind of recovery).
> 
> Regards,
> —Ed
> 
> 
>> On Apr 6, 2015, at 17:52, NeilBrown <neilb@xxxxxxx> wrote:
>> 
>> On Mon, 6 Apr 2015 16:45:58 -0700 Eduard Rozenberg <eduardr@xxxxxxxxx> wrote:
>> 
>>> Hello folks,
>>> 
>>> I previously had the following setup:
>>> 
>>> sda & sdb partitioned w/ GPT, 7 partitions each (usr, opt, var etc...)
>>> 7 raid1’s with 2 devices for each pair of partitions (/dev/sda1 & /dev/sdab1, etc)
>>> They’d been created under Slackware 13.37.
>>> 
>>> I was trying to clean out mdadm from those partitions but keep the data so I ran 
>>> "mdadm --zero-superblock” on each of those previously RAID1 mdadm 1.2 ext4 
>>> partitions.
>> 
>> The "1.2" metadata is stored 4k from the start of the device.  The actual
>> data is some megabytes further in.  I don't suppose you still have the output
>> of "mdadm --examine" from before you destroyed the superblocks??
>> 
>>> 
>>> As a result I am now currently unable to mount any partition after the first one on either
>>> disk. The first partition does mount. The partition table is visible and looks fine in gdisk.
>>> 
>>> mount -t ext4 /dev/sdac2 /mnt
>>> mount: wrong fs type, bad option, bad superblock on /dev/sdac2,
>>> missing codepage or helper program, or other error
>>> In some cases useful info is found in syslog - try
>>> dmesg | tail or so
>>> 
>>> I did try superblock recovery with each backup superblock that ext4 normally creates,
>>> but none of the superblock locations worked.
>>> 
>>> For example:
>>> 
>>> fsck.ext4 -b 4096000 /dev/sdac2 
>>> e2fsck 1.42.8 (20-Jun-2013)
>>> /sbin/e2fsck: Invalid argument while trying to open /dev/sdac2
>>> 
>>> The superblock could not be read or does not describe a correct ext2
>>> filesystem. If the device is valid and it really contains an ext2
>>> filesystem (and not swap or ufs or something else), then the superblock
>>> is corrupt, and you might try running e2fsck with an alternate superblock:
>>> e2fsck -b 8193 <device>
>>> 
>>> 
>>> Would be grateful for any advice on anything else I can try.
>> 
>> You need to find where the filesystem actually starts, then you need to
>> create some way to access it as a block device, then it should "just work".
>> 
>> An ext4 filesystem superblock has 0xef53 at an offset of 0x38, and the
>> superblock is typically 1K from the start of the partition.
>> 
>> So you could:
>>  od -x /dev/sdac2 | awk '$6 == "ef53"'
>> 
>> Then subtrace 0002060 (octal) from the leading number, and that might be the
>> start of the partition.
>> 
>> Then
>> losetup -o "start in decimal" /dev/loop0 /dev/sdac2
>> 
>> and try 'fsck' on /dev/loop0
>> 
>> Good luck.
>> 
>> NeilBrown
>> 
>> 
>> 
>>> 
>>> Regards,
>>> —Ed--
>>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html