Re: raid5 (/dev/md0) w. luks broken - can't decrypt

Stygge <styggere@xxxxxxxxx> · Sun, 23 Jan 2011 22:32:05 +0100

On Sun, Jan 23, 2011 at 2:44 PM, Arno Wagner <arno@xxxxxxxxxxx> wrote:
> On Sat, Jan 22, 2011 at 07:41:18PM +0100, Stygge wrote:
>> On Fri, 21 Jan 2011 at 13:46, Arno Wagner wrote:
>> >
:
:
>> # mdadm -D /dev/md0
>> /dev/md0:
>>         Version : 0.90
>>   Creation Time : Wed Jan 19 22:16:47 2011
>
> That does look bad, unless you really created the original array
> on Wednsday last week. Looks more like the array was re-created
> from the original disks, but not the original superblock.

*ouch*

> I suspect this could lead to a different parity disk or different
> disk order in the array. It _is_ possibe that I am reading this
> wrong, and the creation time gets updated on whatever recovery
> was done.
>
> Question is how did the array recover? It clearly did not do
> so by itself. Which distribution is this?

No, it didn't reassemble itself until last boot - I first did a manual
stop and assemble, just to see if it was some sort of "burp" in the
driver -

The distro is CentOS 5.5 (Final), kernel 2.6.18-194.26.1.el5.centos.plus, x86_64

>> # cat /sys/block/md0/md/mismatch_cnt
>> 72
>
> Wups, 72 mismatches on an array that was last synced a few
> days ago? Maybe this is actually one or more dying disks.
> Anything else than a zero result is bad.

Uh-oh - now I'm getting really worried!

>
>> > If the RAID did not assemble again properly,
>> > manual intervention and assembly may be necessary
>> > in order to unlock and safe the data.
>>
>> mdadm --assemble --scan works fine and sets up the raid just fine.
>
> Yes. But after 2 disks were kicked form a RAID5 it should
> not do that, unless you force it to. And if you force it,
> and it guesses wrong about which disk was kicked first,
> it will mix the state as the first disk was kicked with the
> state when the second disk was kicked. And overwrite the disk
> kicked second in the process.
>
> As far as I can see, this would result in a mixed disk state.
> Everything written between the first and second kick would be
> corrupt. However the key-slot would not be unless changed in
> between. It might just have a wrong disk order. Again, if it
> did assemble from the existing superblocks, the order will be
> right.
>
>> I *really* hope that I'm not completely scr*w*d :-(
>
> Impossible to tell at this time.
>
> Ok, next steps:
>
> 1. Post or send me a long SMART status for each disk
>   (smartcl -a /dev/<disk>), these 72 inconsistencies
>   are not good and need to be looked at.
>
> 2. Can you give me the header backup (a bit more than 1MB)?
>   I cannot break your security that way, but looking at the
>   borders of the header and keyslot may tell me whether
>   the disk order was mixed up. This may allow reshuffeling
>   of the disks to the correct order. (If that is the problem.)
>
>   The rpocedure would be to produce one or more reshuffeled
>   headers and give them back to you to see whether any allows
>   an unlock. If so, the next step would be to reshuffle the whole
>   array (which would still be inconsistent).
>
> 3. You could also give me the first 260kB of each disk, then
>   I can check whether any of the 72 inconsistencies is in the
>   key-slot area. Again, this does not allow me to break your
>   security. And again, this would allow to create an alternate
>   header that could work and allow you to unlock.

I'll send these files off-list.

> These are my buest guesses. You can do all that yourself as
> well, refer to the FAQ for details of the on-disk LUKS
> structure, and Wikipedia for RAID5 if you plan to. There
> should also be information on how to interpret SMART data on
> the web, although I have some long-term experience in that
> area.

I'll take you up on that offer as I'm quite the newbie with this sort
of thing :-)

/S
_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt