Re: Corruptions in the dm-crypt layer 2.6.22

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



At Sat, 03 Nov 2007 16:10:54 +0100,
Milan Broz <mbroz@xxxxxxxxxx> wrote:

> Clemens Fruhwirth wrote:
> > Yesterday I renamed the LVM volume (vgrename), and after that the LVM
> > header was corrupted. 
>
> So you are talking about some race between writing LUKS header in combination
> with writing LVM metadata area  ?

No, I'm talking about corruption that results from directly accessing
dm-crypt backed device nodes, e.g. by using LVM tools with
/dev/mapper/you-encrypted-volume, or by using cryptsetup
/dev/your-block-device (cryptsetup sets up the mapping for itself)

> *Not* corruption of data on volume, just header ?

There are no corruptions on my file systems, but we all know that file
systems are totally different when it comes to accessing block
devices.

> Could you please send me steps what you did before corruption detection ?

Here is the difference:

ghanima ~ # pvcreate -Z y /dev/mapper/vgbackup2
  Physical volume "/dev/mapper/vgbackup2" successfully created
ghanima ~ # pvdisplay /dev/mapper/vgbackup2
  No physical volume label read from /dev/mapper/vgbackup2
  Failed to read physical volume "/dev/mapper/vgbackup2"

Now on the underlying device:

ghanima ~ # pvcreate /dev/sdc2
  Physical volume "/dev/sdc2" successfully created
ghanima ~ # pvdisplay /dev/sdc2
  --- NEW Physical volume ---
  PV Name               /dev/sdc2
  VG Name               
  PV Size               465.71 GB
  Allocatable           NO
  PE Size (KByte)       0
  Total PE              0
  Free PE               0
  Allocated PE          0
  PV UUID               hgu8k5-22di-VlwP-8WDK-uZsO-WAbe-zYAuwB

This has even become more reproducible, not a single pvcreate
succeeds. Also LUKS Format fails with respect to subsequent LUKS Open
commands. Probably this is only in conjunction with the SATA
subsystem? Watch the little demo using /dev/loop4 as intermediate LUKS
device, then block copying this to the SATA device.

ghanima ~ # echo test > /tmp/key
ghanima ~ # cryptsetup luksFormat /dev/loop4 /tmp/key
.. regular warning...
Command successful.
ghanima ~ # dd if=/dev/loop4 of=/dev/sdc2
10000+0 records in
10000+0 records out
5120000 bytes (5.1 MB) copied, 2.93057 s, 1.7 MB/s
ghanima ~ # cryptsetup -d /tmp/key luksOpen /dev/sdc2 vgbackup2
key slot 0 unlocked.
Command successful.

Now using /dev/sdc2 directly:

ghanima ~ # cryptsetup luksFormat /dev/sdc2 /tmp/key
.. regular warning...
Command successful.

ghanima ~ # cryptsetup luksOpen -d /tmp/key /dev/sdc2 vgbackup2
Command failed: No key available with this passphrase.

These are the first signs of corruption. I can't open a volume with a
file-based key that I formated a second ago.  To get more reproducible
results, I made my cryptsetup version deterministic (always generating
the same header for luksFormat). That's quite interesting:

ghanima ~ # ~clemens/devel/luks/cryptsetup/src/cryptsetup luksFormat /dev/loop5 /tmp/key
... regular warning..
Command successful.
ghanima ~ # ~clemens/devel/luks/cryptsetup/src/cryptsetup luksFormat /dev/loop4 /tmp/key
... regular warning..
Command successful.
ghanima ~ # cmp /dev/loop4 /dev/loop5
ghanima ~ # 

This proves that my version of cryptsetup is indeed utilizing no random sources.

Now:
ghanima ~ # ~clemens/devel/luks/cryptsetup/src/cryptsetup luksFormat /dev/sdc2 /tmp/key
...
ghanima ~ # cmp /dev/loop4 /dev/sdc2
/dev/loop4 /dev/sdc2 differ: byte 4301864, line 251

A few more tries (always running a luksFormat in between)

ghanima ~ # cmp /dev/loop4 /dev/sdc2
/dev/loop4 /dev/sdc2 differ: byte 40, line 1

ghanima ~ # cmp /dev/loop4 /dev/sdc2
/dev/loop4 /dev/sdc2 differ: byte 552, line 3

ghanima ~ # cmp /dev/loop4 /dev/sdc2
/dev/loop4 /dev/sdc2 differ: byte 15397, line 51

If you put all this numbers through modulo 512 the corruption occurs
always at 40th byte of a sector. This is a very clear result of
corruption in my opinion. Hopefully this is localized to my particular
kernel branch. It's quite interesting that accessing a virtual block
device such as /dev/loop4 does not cause any problems, but when it's a
real block device such as my USB attached /dev/sdc2 it show
corruption. page cache corruption?

cryptsetup and LVM's pvcreate access the data in very similar way btw:

open("/dev/mapper/vgbackup2",                 O_RDWR|O_EXCL|O_DIRECT|O_NOATIME) = 4
open("/dev/mapper/temporary-cryptsetup-3461", O_RDWR|O_EXCL|O_SYNC|O_DIRECT) = 4

> Are you sure that you see this corruption in latest .22 kernel (2.6.22.11) ?

Will test in the afternoon. Results above are for:
Linux ghanima 2.6.22-gentoo-r6-64 #8 PREEMPT Tue Oct 9 13:50:01 CEST 2007 x86_64 AMD Athlon(tm) 64 Processor 3000+ AuthenticAMD GNU/Linux
that's basically 2.6.22.6.

But for the moment until I found out what's really the cause, I'd say
hands off 2.6.22.*
--
Fruhwirth Clemens - http://clemens.endorphin.org 

---------------------------------------------------------------------
dm-crypt mailing list - http://www.saout.de/misc/dm-crypt/
To unsubscribe, e-mail: dm-crypt-unsubscribe@xxxxxxxx
For additional commands, e-mail: dm-crypt-help@xxxxxxxx


[Index of Archives]     [Device Mapper Devel]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux