I'm working on a back-ported dm-cache version for kernel 2.6.32-431.29.2 (the CentOS 6 patched one) and I'm trying to solve a corruption bug apparently introduced during the back-port. I can consistently reproduce it by simply mounting an ext4 file-system that contains some data and running stat(1) against a specific directory. stat(1) fails with "Input/output error" and dmesg says:"EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 29". The file-system is mounted with options "data="" in order to minimise noise.
Since I'm using the pass-through mode, my theory is that dm-cache:
(1) forwards the bio to the wrong device, and/or
(2) forwards the bio to the wrong location of the device (e,g, bio length and/or offset are wrong), and/or
(3) copies the wrong piece of data from a forwarded bio to the original bio, assuming it copies data in the first place (I don't know much about the device mapper at this point).
I tried to confirm which of the above 3 could be happening by checking which bio goes where using btrace(8). Specifically, I ran btrace(8) against the cache target, the HDD, the SSD data and metadata devices (4 traces in total). I observed that no bios go to the SSD data and metadata devices, so this rules out (1). I also observed that read-ahead requests issued to the cache target don't get forwarded to the HDD. I don't know whether or not this can be a problem in the first place (can read-ahead bios be ignored?), let alone identifying this being the problem, but I think it's worth it ensuring that it really doesn't cause any problems.
Below are the traces when mounting the file-system:
cache target trace:
253,8 7 3 27.218701098 28536 Q R 2 + 2 [mount]
253,8 7 4 27.218726465 28536 U N [mount] 0
253,8 7 5 27.222694538 28536 Q R 0 + 8 [mount]
253,8 7 6 27.222707270 28536 U N [mount] 0
253,8 7 7 27.226580397 28536 Q R 8 + 8 [mount]
253,8 7 8 27.226598088 28536 U N [mount] 0
253,8 7 9 27.229666137 28536 Q RA 2832 + 8 [mount]
253,8 7 10 27.229677500 28536 Q RM 2824 + 8 [mount]
253,8 7 11 27.229679348 28536 U N [mount] 0
253,8 1 2 27.222630997 28198 C R 2 + 2 [0]
253,8 1 3 27.226560799 28198 C R 0 + 8 [0]
253,8 1 4 27.229570827 28198 C R 8 + 8 [0]
253,8 1 5 27.232313463 28198 C RM 2824 + 8 [0]
253,8 3 1 27.229683980 28291 C RA 2832 + 8 [0]
253,8 7 12 27.232360573 28536 Q R 4456448 + 8 [mount]
253,8 7 13 27.232402040 28536 U N [mount] 0
253,8 6 1 27.243263044 28204 C R 4456448 + 8 [0]
HDD trace:
253,5 1 3 27.222584291 28198 C R 2 + 2 [0]
253,5 7 2 27.218685545 28536 U N [(null)] 0
253,5 7 3 27.222664774 28536 U N [(null)] 0
253,5 3 1 27.218694575 28291 Q R 2 + 2 [dm-cache]
253,5 3 2 27.222670216 28291 Q R 0 + 8 [dm-cache]
253,5 3 3 27.226566647 28291 Q R 8 + 8 [dm-cache]
253,5 1 4 27.226516192 28198 C R 0 + 8 [0]
253,5 1 5 27.229526352 28198 C R 8 + 8 [0]
253,5 1 6 27.232269516 28198 C RM 2824 + 8 [0]
253,5 7 4 27.226555641 28536 U N [(null)] 0
253,5 7 5 27.229636877 28536 U N [(null)] 0
253,5 7 6 27.232331776 28536 A R 4456448 + 8 <- (253,8) 4456448
253,5 7 7 27.232332898 28536 Q R 4456448 + 8 [(null)]
253,5 7 8 27.232359557 28536 U N [(null)] 0
253,5 3 4 27.229649990 28291 Q RM 2824 + 8 [dm-cache]
253,5 6 1 27.243215063 28204 C R 4456448 + 8 [0]
The "RA 2832 + 8" request (7th line in the 1st trace) issued to the cache target gets completed without ever reaching the HDD. Is this OK? I've started looking at the code but I haven't found yet anything specific to read-ahead bios.
Regarding my 3rd theory (data getting corrupted by dm-cache after read from the HDD), is there some relatively easy way to confirm this? E.g. could btrace tell me when a bio completes the checksum of the bio's data?
Is there something else that could be wrong?
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel