Data corruption when using dm-crypt over RAID5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello, all.

This is about well known "dm-crypt is broken and causes massive data damage". Please read.

I'm trying to chase data corruption that happens when I use dm-crypt over RAID5. About 3 years ago I build a storage server for myself that uses same setup (dm-crypt partition over software RAID5 array) and used it ever since without any problems. Last week I built another server with same setup with only difference that it is x86_64 and disks are bigger. I have a random repro of this corruption. I spent several days chasing it, but still have not found the cause. Here is my setup:

- 64-bit amd cpu, 1 gb ram, 2.6.18.3 kernel

/proc/mdstat

md1 : active raid5 hdc3[2] hdb2[1] hda2[0]
     778485504 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]

Firstly, the problem can be reproduced by making following steps:
- create crypt target
- copy about 4GB of data (pdgflush takes about 40% CPU, and all free memory goes to cache)
- e2fsck Цf

Attached is a scrip I use.

What I verified so far (in order I tried):

this is dm-crypt related. Replacing crypt with linear eliminates all corruptions. i.e. "0 18874368 linear /dev/md1 1280" works while "0 18874368 crypt aes-cbc-plain $KEY128 0 /dev/md1 1280" yields corruptions.

This is not crypto or iv related. Next thing I created another crypto algorithm "aesfake" which does nothing but has all characteristics of AES (including CPU load). To my surprise problem still reproduces. Below is relevant routine from crypto module:
static void aes_fake_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
{
   u8 tmp[AES_BLOCK_SIZE];
   aes_enc_blk(tfm, tmp, src);
   memmove(dst,src,AES_BLOCK_SIZE);
}

This is not an issue of wrong sector remap or data corruption. I added a huge per-cc allocation in dm-crypt (about 100MB per 9 GB volume) , where I was tracking 3 things: crc of a sector, sector number, sector number written (i.e. I was rewriting first 4 bytes of each sector with sector number, and memorized original data in memory). The "corrupting" and "check" for sector number were done in crypt_convert_scatterlist. In addition, I added a code that calculated and memorized sector checksum on write and verified it on reads. Checksum calculation was in map, on a virgin bio before dm-crypt did anything to it. Verification was right after decryption, on cloned bio's. All of this yelded NOTHING, i.e. everything worked as expected.

The problem SEEMS to be in a read cycle. Since corruptions reproduce even with a dummy no-op cipher, I started to disable various paths in dm-crypt. I do this by converting dm-crypt to linear for some boi's by adding following code to the top of map function:
   int bypass = 0;

   // bypass certain requests
   if (bio_data_dir(bio) == WRITE) bypass=1;
//    if (bio_data_dir(bio) == READ) bypass=1;

   if (bypass)
   {
       bio->bi_bdev = cc->dev->bdev;
       bio->bi_sector = cc->start + sector;

       mempool_free(io, cc->io_pool);
       return 1;
   }
It turns out that having a read path is a requirement for trouble. If I bypass write problem still persist, but bypassing read eliminates the problem completely.

So far my theory is that some reads are just disappearing under heavy load. This is the only thing I can think of, why all crc/sector checks pass (they occur in endio routine) but corruptions still occur.

Does it make any sense? Anyone has any ideas what to check next? Is there any special kernel branch that has this problem fixed long ago? Any comments are welcome.


Thanks, Andrey.


#!/bin/sh

#umount if mounted
umount /dev/mapper/v0
dmsetup remove v0  2> /dev/null

echo Dropping cache...
sync
echo 3 > /proc/sys/vm/drop_caches
sync

modprobe dm-crypt

# create 9-gb crypt mapping with offset of 10 64k blocks (luks emulation)

# we need a random key for e2fsck errors to pop up and to trigger trk crc failures
KEY128=`dd status=noxfer count=16 if=/dev/random bs=1 | md5sum | cut -d \  -f 1`
echo key=$KEY128

#echo "0 18874368 crypt aes-cbc-plain $KEY128 0 /dev/md1 1280" | dmsetup create v0
echo "0 18874368 crypt aesfake-ecb $KEY128 0 /dev/md1 1280" | dmsetup create v0
#echo "0 18874368 linear /dev/md1 1280" | dmsetup create v0

mke2fs -b 4096 -R stride=16 /dev/mapper/v0 1572864
mount /dev/mapper/v0 /mnt/vault
echo Copying files...
time unrar x -inul /home/lelik/tf1.rar /mnt/vault
umount /dev/mapper/v0

echo Dropping cache...
sync
echo 3 > /proc/sys/vm/drop_caches
sync

e2fsck /dev/mapper/v0           # <- this returns "clean"
e2fsck -f /dev/mapper/v0        # <- this reports corrupted inodes


---------------------------------------------------------------------
dm-crypt mailing list - http://www.saout.de/misc/dm-crypt/
To unsubscribe, e-mail: dm-crypt-unsubscribe@xxxxxxxx
For additional commands, e-mail: dm-crypt-help@xxxxxxxx

[Index of Archives]     [Device Mapper Devel]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite News]     [KDE Users]     [Fedora Tools]     [Fedora Docs]

  Powered by Linux