Hello, all.
This is about well known "dm-crypt is broken and causes massive data
damage". Please read.
I'm trying to chase data corruption that happens when I use dm-crypt
over RAID5. About 3 years ago I build a storage server for myself that
uses same setup (dm-crypt partition over software RAID5 array) and used
it ever since without any problems. Last week I built another server
with same setup with only difference that it is x86_64 and disks are
bigger. I have a random repro of this corruption. I spent several days
chasing it, but still have not found the cause. Here is my setup:
- 64-bit amd cpu, 1 gb ram, 2.6.18.3 kernel
/proc/mdstat
md1 : active raid5 hdc3[2] hdb2[1] hda2[0]
778485504 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_]
Firstly, the problem can be reproduced by making following steps:
- create crypt target
- copy about 4GB of data (pdgflush takes about 40% CPU, and all free
memory goes to cache)
- e2fsck Цf
Attached is a scrip I use.
What I verified so far (in order I tried):
this is dm-crypt related. Replacing crypt with linear eliminates all
corruptions. i.e. "0 18874368 linear /dev/md1 1280" works while "0
18874368 crypt aes-cbc-plain $KEY128 0 /dev/md1 1280" yields corruptions.
This is not crypto or iv related. Next thing I created another crypto
algorithm "aesfake" which does nothing but has all characteristics of
AES (including CPU load). To my surprise problem still reproduces. Below
is relevant routine from crypto module:
static void aes_fake_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
{
u8 tmp[AES_BLOCK_SIZE];
aes_enc_blk(tfm, tmp, src);
memmove(dst,src,AES_BLOCK_SIZE);
}
This is not an issue of wrong sector remap or data corruption. I added a
huge per-cc allocation in dm-crypt (about 100MB per 9 GB volume) , where
I was tracking 3 things: crc of a sector, sector number, sector number
written (i.e. I was rewriting first 4 bytes of each sector with sector
number, and memorized original data in memory). The "corrupting" and
"check" for sector number were done in crypt_convert_scatterlist. In
addition, I added a code that calculated and memorized sector checksum
on write and verified it on reads. Checksum calculation was in map, on a
virgin bio before dm-crypt did anything to it. Verification was right
after decryption, on cloned bio's. All of this yelded NOTHING, i.e.
everything worked as expected.
The problem SEEMS to be in a read cycle. Since corruptions reproduce
even with a dummy no-op cipher, I started to disable various paths in
dm-crypt. I do this by converting dm-crypt to linear for some boi's by
adding following code to the top of map function:
int bypass = 0;
// bypass certain requests
if (bio_data_dir(bio) == WRITE) bypass=1;
// if (bio_data_dir(bio) == READ) bypass=1;
if (bypass)
{
bio->bi_bdev = cc->dev->bdev;
bio->bi_sector = cc->start + sector;
mempool_free(io, cc->io_pool);
return 1;
}
It turns out that having a read path is a requirement for trouble. If I
bypass write problem still persist, but bypassing read eliminates the
problem completely.
So far my theory is that some reads are just disappearing under heavy
load. This is the only thing I can think of, why all crc/sector checks
pass (they occur in endio routine) but corruptions still occur.
Does it make any sense? Anyone has any ideas what to check next? Is
there any special kernel branch that has this problem fixed long ago?
Any comments are welcome.
Thanks, Andrey.
#!/bin/sh
#umount if mounted
umount /dev/mapper/v0
dmsetup remove v0 2> /dev/null
echo Dropping cache...
sync
echo 3 > /proc/sys/vm/drop_caches
sync
modprobe dm-crypt
# create 9-gb crypt mapping with offset of 10 64k blocks (luks emulation)
# we need a random key for e2fsck errors to pop up and to trigger trk crc failures
KEY128=`dd status=noxfer count=16 if=/dev/random bs=1 | md5sum | cut -d \ -f 1`
echo key=$KEY128
#echo "0 18874368 crypt aes-cbc-plain $KEY128 0 /dev/md1 1280" | dmsetup create v0
echo "0 18874368 crypt aesfake-ecb $KEY128 0 /dev/md1 1280" | dmsetup create v0
#echo "0 18874368 linear /dev/md1 1280" | dmsetup create v0
mke2fs -b 4096 -R stride=16 /dev/mapper/v0 1572864
mount /dev/mapper/v0 /mnt/vault
echo Copying files...
time unrar x -inul /home/lelik/tf1.rar /mnt/vault
umount /dev/mapper/v0
echo Dropping cache...
sync
echo 3 > /proc/sys/vm/drop_caches
sync
e2fsck /dev/mapper/v0 # <- this returns "clean"
e2fsck -f /dev/mapper/v0 # <- this reports corrupted inodes
---------------------------------------------------------------------
dm-crypt mailing list - http://www.saout.de/misc/dm-crypt/
To unsubscribe, e-mail: dm-crypt-unsubscribe@xxxxxxxx
For additional commands, e-mail: dm-crypt-help@xxxxxxxx