Hi,
I'm currently testing dm-cache on 'debian 3.12-1-amd64 #1 SMP Debian 3.12.9-1' vanilla kernel and finally reproduced the issue with data corruption on cached device.
Using small and simple setup:
root@debian:~# blockdev --getsz /dev/sda5
1951744
root@debian:~# blockdev --getsz /dev/sda6
122880
root@debian:~# blockdev --getsz /dev/sda7
28672
root@debian:~# dmsetup create cached --table '0 1951744 cache /dev/sda7 /dev/sda6 /dev/sda5 512 1 writeback default 0'
and a script that actually starts many (let's say 100) instances of badblock tool, writing, reading and comparing 0x55, 0xaa, 0xff, 0x00 patterns, where each instance repeats that procedure over is own range of blocks. See my bash script (attached)
Here how I run it:
./dm-stress-test.sh -n 100 -r 10000 -d /dev/mapper/cached -t p
/dev/mapper/cached, 512, 1951744
running parallel test
checking blocks 0 to 19516
checking blocks 19517 to 39033
.
.
.
checking blocks 1932183 to 1951699
waiting for bad blocks
file has bad blocks, ./bad_block_43
839295
file has bad blocks, ./bad_block_43
839295
file has bad blocks, ./bad_block_43
839295
^CCTRL-C exiting ...
here is the dm-cache status at the time of first corrupted block:
root@debian:~# dmsetup status
cached: 0 1951744 cache 7/3584 852 126848 1040 166813 0 85 85 71 0 2 migration_threshold 2048 4 random_threshold 4 sequential_threshold 512
I've also tried this test on custom build kernel 'Linux george 3.13.0+ #3 SMP Mon Feb 17 10:44:59 EET 2014 x86_64 x86_64 x86_64 GNU/Linux' and there is the same issue
If using ram device for cache and sata for origin the first corrupted blocks appears almost immediately .
We been testing dm-cache on larger setups (see https://www.redhat.com/archives/dm-devel/2014-January/msg00135.html) but there we were able to corrupt cached device after several days under heavy traffic.
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel