Re: Passphrase stops working.

Two Spirit <twospirit6905@xxxxxxxxx> · Wed, 18 Jul 2012 16:26:26 -0700

Thanks for responding . response inline below

On Wed, Jul 18, 2012 at 3:12 PM, Arno Wagner <arno@xxxxxxxxxxx> wrote:

Hi,

On Wed, Jul 18, 2012 at 02:34:58PM -0700, Two Spirit wrote:

> Hello, I just wanted to get back to you.

Thanks, always appreciated.

> After wasting a huge amount of

> time trying to restore my data and have it  lost  for a second and third

> time, I have reproduced the problem and can confirm that it does exist.

>

> I'm still [can't believe it] am doing testing and have some theories which

> I'm checking out to isolate more if the problem is LUKS or mdadm that is

> causing the

> corruption to LUKS. Testing with small test files does not reproduce the

> problem, however in full scale mode the

> problem exists and with 4TB drives, rebuild time takes quite a bit of time.

> The problem is that the

> first disk of a 4 disk mdadm raid drive looses its partition table (was

> originally msdos partition type 0xfd linux auto-raid created in ubuntu-8.04

> with mdadm metadata-0.9). This "corruption/bug/feature" is why LUKS no

> longer works. The raid5 has been grown with almost every iteration of new

> hard drive capacity without fail except when going from the 2TB to 4TB

> upgrade. The mdadm raid5 with the LUKS decrypted (cryptsetup luksOpen)

> works and does not recognize that there is any corruption, so as long as

> the HA server is up and running, the sysadmin has no clue there is a

> problem since "cryptsetup luksOpen" is only done rarely or on reboot. The

> disks 2,3, and 4 seem OK with my limited knowledge.

>

> I have since found out that metadata-0.9 only supports up to 2TB drives. I

> suspect that the problem is when growing a live raid past the 2TB boundary

> is when the corruption is seen. The raid is operating at 2TB drive capacity

>  (lowest common drive size) but has 4TB drives in the mix while in the

> process of growing the raid.

I would suspect there is a "wrap around" somwhere in the process

when the RAID gets a tomy bit larger than 2GB. That would write

right over the LUKS header. This should _not_ be happening (i.e.

is a bug), but would be plausible. This happens when sector numbers

are restricted to 2^22 by logical "and" with (2^22)-1.

I assume you meant 2TB not GB. and is plausible, but I'm a nobody so I can't fix it even if it did. someone from this community will hopefully follow with the right person. 

Can you check whether the start of the other RAID drives is also

overwritten? The header itself is small enough to be only

on one disk, but the keyslot-area should get distribute on all of

them.
I'm not quite sure what you mean by "start of the raid". Of the 4 disk raid5, the disks 2-3-4 seem OK, and the disk 1 is missing the msdos partition table which would then be missing the 0xfd linxu auto-raid partition which contains the 256 byte mdadm metadata v0.9 superblock.  This is all while the raid is running and LUKS is already open so there is no other indication of anything going wrong. As far as I know from the raid point of view, nothing seems to be wrong. I was able to rebuild the raid array. It was more of a fluke that I found this corruption because the only thing I really noticed is that I couldn't get my luks open. It was strange(not a bug, is a feature) that there was no errors reported by mdadm of the corrupt state of disk1. But I was running ubuntu-8.04 so they might not have had good error reporting back then.

> I am not focusing my testing right now on isolating the problem, but

> converting to mdadm metadata-1.2 to see if the problem goes away.

If it does, it would be a good idea to send a bug report to the

raid mailing list as well (no idea which one that is exactly

though).
Once I can find and get attached to the mailing list and get more understanding of the problem, I probably will, but ultimately I hope someone from this community will get this bug problem listed in some place useful.

> I ran

> across other os imposed file system maximum limitations (which evidently

> were arbitrarily imposed) so I'm upgrading to ubuntu-12.04 which removes

> these. So at best I can confirm that the problem won't exist with a new mix

> of mdadm/cryptsetup/gpt using parted (new sizes also past the fdisk/msdos

> partition maximums too) doesn't exist, but I suspect that other people who

> are just upgrading to higher capacities might run across the same problem I

> ran across.

Indeed.

> To address your first statement, I'm starting to think too it is not a LUKS

> generated problem, but the problem does corrupt LUKS.

Yes. Sounds like LUKS is only the detector here. Unfortunately,

encryption with metadata always makes things more fragile.

Still, please report any additional findings.

Gr"usse,

Arno

>

> On Mon, Jul 9, 2012 at 12:10 AM, Arno Wagner <arno@xxxxxxxxxxx> wrote:

>

> > First, this does not sound like a LUKS problem, but something

> > else.

> >

> > Second, a second passphrase is basically worthless as "backup".

> > As described in the FAQ, what you need is a LUKS header backup.

> >

> > Now, as you describe, this happened only after a while.

> > This indicated there is some conection to the data on the

> > partition.

> >

> > One possibility is that you have the data not in the LUKS

> > container, but overlayed with it. This would mean a) your

> > data is not encrypted and b) at some time you overwrite

> > the keyslot area with data, breaking the LUKS header. A possible

> > other alternative is that you placed the RAID superblock in

> > the keyslot area, but that should only kill one passphrase.

> >

> > The only other idea I have at this time is that you got the

> > partition boders wrong when going to GPT and that somehow cause

> > other data to end up in the LUKS keyslot area.

> >

> > It cpould be something else entirely, of course.

> >

> > Please give all commands you use, including raid creation

> > and mounting and a full partion table dump. You can also

> > email me the LUKS header (see FAQ), and I can look  for

> > corruption. This does not compromise your security if you

> > do not use this header again or if the keyslots are not

> > recoverable (or if you trust me to destroy the header after

> > I look at it).

> >

> > Arno

> >

> >

> >

> > On Sun, Jul 08, 2012 at 09:41:19PM -0700, Two Spirit wrote:

> > > I created a 4 drive RAID5 setup using mdadm and upgrading from 2TB drives

> > > to the new Hitachi 7200RPM 4TB drives. I can initially open my luks

> > > partition, but later can no longer access it.

> > >

> > > I can no longer access my LUKS partition even tho I have the right

> > > passphrases. It was working and then at an unknown point in time loose

> > > access to LUKS. I've used the same procedures for upgrading from 500G to

> > > 1TB to 1.5TB to 2TB. After the first time this happened a week ago, I

> > > thought maybe there was some corruption so I added a 2nd Key as a backup.

> > > After the second time the LUKS became unaccessible, none of the keys

> > worked.

> > >

> > > I put LUKS on it using

> > >

> > > cryptsetup -c aes -s 256 -y luksFormat /dev/md0

> > >

> > > # cryptsetup luksOpen /dev/md0 md0_crypt

> > > Enter LUKS passphrase:

> > > Enter LUKS passphrase:

> > > Enter LUKS passphrase:

> > > Command failed: No key available with this passphrase.

> > >

> > > The first time this happened while I was upgrading to 4TB drives, I

> > thought

> > > it was a fluke, and ultimately had to recover from backups. I went an

> > used

> > > luksAddKey to add a 2nd key as a backup. It happened again and I tried

> > both

> > > passphrases, and neither worked. The only thing I'm doing differently

> > this

> > > time around is that I've upgraded to 4TB drives which use GPT instead of

> > > fdisk.

> > >

> > > The last time I had to even reboot the box was over 2 years ago.

> > >

> > > I'm using ubuntu-8.04-server with kernel 2.6.24-29 and upgraded to

> > > -2.6.24-31, but that didn't fix the problem.

> >

> > > _______________________________________________

> > > dm-crypt mailing list

> > > dm-crypt@xxxxxxxx

> > > http://www.saout.de/mailman/listinfo/dm-crypt

> >

> >

> > --

> > Arno Wagner,    Dr. sc. techn., Dipl. Inform.,   Email: arno@xxxxxxxxxxx

> > GnuPG:  ID: 1E25338F  FP: 0C30 5782 9D93 F785 E79C  0296 797F 6B50 1E25

> > 338F

> > ----

> > One of the painful things about our time is that those who feel certainty

> > are stupid, and those with any imagination and understanding are filled

> > with doubt and indecision. -- Bertrand Russell

> > _______________________________________________

> > dm-crypt mailing list

> > dm-crypt@xxxxxxxx

> > http://www.saout.de/mailman/listinfo/dm-crypt

> >

> _______________________________________________

> dm-crypt mailing list

> dm-crypt@xxxxxxxx

> http://www.saout.de/mailman/listinfo/dm-crypt

--

Arno Wagner,    Dr. sc. techn., Dipl. Inform.,   Email: arno@xxxxxxxxxxx

GnuPG:  ID: 1E25338F  FP: 0C30 5782 9D93 F785 E79C  0296 797F 6B50 1E25 338F

----

One of the painful things about our time is that those who feel certainty

are stupid, and those with any imagination and understanding are filled

with doubt and indecision. -- Bertrand Russell

_______________________________________________

dm-crypt mailing list

dm-crypt@xxxxxxxx

http://www.saout.de/mailman/listinfo/dm-crypt

_______________________________________________
dm-crypt mailing list
dm-crypt@xxxxxxxx
http://www.saout.de/mailman/listinfo/dm-crypt