> On Thu, 18 Jan 2007 21:11:58 +0100 noah <noah123@xxxxxxxxx> wrote: > Hi! > > I'm experiencing data corruption in the following setup: > > 1. mdadm --create /dev/md0 -n3 -lraid5 /dev/hda1 /dev/hdc1 /dev/hde1 > 2. cryptsetup -c aes-cbc-essiva:sha256 luksFormat /dev/md0 mykey > 3. cryptsetup -d mykey luksOpen /dev/md0 cryptvol > 4. pvcreate /dev/mapper/cryptvol > 5. vgcreate vg0 /dev/cryptvol > 6. lvcreate -n root -L10G vg0 > 7. mkreiserfs -q /dev/vg0/root > 8. mkdir /.newroot; mount /dev/vg0/root /.newroot > 9. mkdir /.realroot; mount -o bind / /.realroot > 10. tar cf - -C /.realroot|tar xvpf - -C /.newroot > > With Linux 2.6.18 (it's broken, OK, but there's still something wrong > even in 2.6.19.2 so keep on reading) I started getting warnings from > ReiserFS indicating severe data corruptions. Reiserfsck confirms > this. It usually happened while extracting the Linux source tree. > > So after asking around I found out dm-crypt had a bug[1] fixed in > early December. > It got fixed in 2.6.19 and the fix was backported and included in 2.6.18.6[2]. > > Fine, so I upgraded to 2.6.18.6, rebuilt the array from scratch and > did the whole procedure again. > No messages from reiserfs in dmesg this time, but reiserfsck still > revealed severe data corruption. > I also found compressed archives and ISO-images for which I had > md5sums to be corrupt. > > I then upgraded to 2.6.19.2 with the exact same result as with 2.6.18.6. > I even verified this on a fairly new computer with different hardware > (Intel CPU and chipset). > > Figured it maybe was some kind of race condition so on my second try > on 2.6.19.2, when recreating the array, I let md finish resyncing it > before copying over the files. > This time, reiserfsck didn't complain. > > Just for the sake of fun, I did the whole thing again, rebuilding the > array from scratch, let md resync the third drive and then I started > to copy over all files again. Thinking the cause of the problem was > heavy disk I/O I tried to stress the other LVM volumes residing on md0 > using tar during the copy. Everything seemed fine; no problems arose. > > Did a few reboots and confirmed that reiserfsck didn't have any > complaints on any of the filesystems residing on the LVM volumes on > md0. > > Started using the machine as normal, and half a day later I unmounted > the filesystems and ran reiserfsck just to make sure everything still > was OK. Unfortunately, it wasn't. > > > The drives in the array are three brand new drives, 2x250GB and one > 200GB, all three IDE drives. > According to SMART there's no problems with them. And they worked > fine in my previous RAID1 setup with dm-crypt and LVM, by the way. > The computer itself is an Athlon XP with less than 1GB of RAM on a M/B > with nForce2 chipset FWIW. No memory errors were detected with > memtest86+ (I completed the full test). > I haven't tried using another filesystem as I've got quite a lot of > faith in reiserfs's stability. > > Is anybody else experiencing these problems? > Unfortunately I'm only able to do limited testing due to busy days, > but I'd love to help if I can. > > > [1] Here's a thread on the recently fixed data corruption bug in dm-crypt > http://article.gmane.org/gmane.linux.kernel.device-mapper.dm-crypt/1974 > > [2] The backport of the dm-crypt fix for 2.6.18.6 is here > http://uwsg.iu.edu/hypermail/linux/kernel/0612.1/2299.html There has been a long history of similar problems when raid and dm-crypt are used together. I thought a couple of months ago that we were hot on the trail of a fix, but I don't think we ever got there. Perhaps Christophe can comment? -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel