Re: Recovering from the kernel bug, Neil?

Oliver Schinagl <oliver+list@xxxxxxxxxxx> · Sun, 09 Sep 2012 22:22:19 +0200

Since I had no reply as of yet, I wonder if I would arbitrarly change 
the data at offset 0x1100 to something that _might_ be right could I 
horribly break something?

oliver

On 08/19/12 15:56, Oliver Schinagl wrote:
Hi list,

I've once again started to try to repair my broken array. I've tried
most things suggested by Neil before (create array in place whilst
keeping data etc etc) only breaking it more (having to new of mdadm).

So instead, I made a dd of: sda4 and sdb4; sda5 and sdb5, both working
raid10 arrays, f2 and o2 layouts. I then compared that to an image of
sdb6. Granted, I only used 256mb worth of data.

Using https://raid.wiki.kernel.org/index.php/RAID_superblock_formats I
compared my broken sdb6 array to the two working and active arrays.

I haven't completly finished comparing, since the wiki falls short at
the end, which I think is the more important bit concerning my situation.

Some info about sdb6:

/dev/sdb6:
Magic : a92b4efc
Version : 1.2
Feature Map : 0x0
Array UUID : cde37e2e:309beb19:3461f3f3:1ea70694
Name : valexia:opt (local to host valexia)
Creation Time : Sun Aug 28 17:46:27 2011
Raid Level : -unknown-
Raid Devices : 0

Avail Dev Size : 456165376 (217.52 GiB 233.56 GB)
Data Offset : 2048 sectors
Super Offset : 8 sectors
State : active
Device UUID : 7b47e9ab:ea4b27ce:50e12587:9c572944

Update Time : Mon May 28 20:53:42 2012
Checksum : 32e1e116 - correct
Events : 1

Device Role : spare
Array State : ('A' == active, '.' == missing)

Now my questions regarding trying to repair this array are the following:

At offset 0x10A0, (metaversion 1.2 accounts for the 0x1000 extra) I
found on the wiki:

"This is shown as "Array Slot" by the mdadm v2.x "--examine" command

Note: This is a 32-bit unsigned integer, but the Device-Roles
(Positions-in-Array) Area indexes these values using only 16-bit
unsigned integers, and reserves the values 0xFFFF as spare and 0xFFFE as
faulty, so only 65,534 devices per array are possible."

sda4 and sdb4 list this as 02 00 00 00 and 01 00 00 00. Sounds sensible,
although I would have expected 0x0 and 0x1, but I'm sure there's some
sensible explanation. sda5 and sdb5 however are slightly different, 03
00 00 00 and 02 00 00 00. It quickly shows that for some coincidental
reason, but the 'b' parts have a higher number then the 'a' parts. So a
02 00 00 00 on sdb6 (the broken array) should be okay.

Then next, is 'resync_offset' at 0x10D0. I think all devices list it as
FF FF FF FF, but the broken device has it at 00 00 00 00. Any impact on
this one?

Then of course tehre's the 0x10D8 checksum. mdadm currently says it
matches, but once I start editing things those probably won't match
anymore. Any way around that?

Then offset 0x1100 is slightly different for each array. Array sd?5
looks like: FE FF FE FF 01 00 00 00
Array sd?4 looks similar enough, FE FF 01 00 00 00 FE FF

Does this correspond to the 01, 02 and 03 value pairs for 0x10A0?

The broken array reads FE FF FE FF FE FF FE, which probably is wrong?

As for determining whether the first data block is offset, or 'real', I
compared dataoffsets 0x100000 - 0x100520-ish and noticed something that
looks like s_volume_name and s_last_mounted of ext4. Thus this should be
the 'real' first block. Since sdb6 has something that looks a lot like
what's on sdb5, 20 80 00 00 20 80 01 00 20 80 02 etc etc at 0x100000
this should be the first offset block, correct?

Assuming I can force somehow that mdadm recognizes my disk as part of an
array, and no longer a spare, how does mdadm know which of the two parts
it is? 'real' or offset? I haven't bumped into anything that would tell
mdadm that bit of information. The data seems to all be still very much
available, so I still have hope. I did try making a copy of the entire
partition, and re-create the array as missing /dev/loop0 (with loop0
being the dd-ed copy) but that didn't work.

Finally, would it even be possible to 'restore' my first 127mb on sda6,
those that the wrong version of mdadm destroyed by reserving 128mb of
data instead of the usual 1mb using data from sdb6?

Sorry for the long mail, I tried to be complete :)

Oliver
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html