Hi all, I'm sorry, I've been a bit busy, so that's why you haven't heard anything new from me. It's nice to get all this input and perhaps it might help me finding the problem. After having looked at the code a lot I suspect that it's not dm-crypt itself (the symptoms just don't make sense with the way it works) but rather some problem elsewhere that is triggered by certain timing conditions. Unfortunately I only found one case where I was able to trigger a bug, but only on a production machine (don't have three hard disks myself to make a RAID5) which I can't debug and only with a certain kernel. Which filesystems have you been seeing the corruption on? I also noticed that there were some RAID5 bug fixes that went into 2.6.17 (which I doubt are connected to the corruption problems though). For those that are trying to help reproducing the problems: I found out that you usually don't have to wait until the kernel starts spitting out error messages. Just run a small test copy process (or something else, I found out that several concurrent writes with some 100 Megs seem to trigger the problem, at least on ext3), unmount the filesystem and run fsck, which will tell you if there was a corruption.
Attachment:
signature.asc
Description: OpenPGP digital signature