back-ported dm-cache not forwarding read-ahead bios to origin

Thanos Makatos <thanos.makatos@xxxxxxxxx> · Mon, 20 Jul 2015 17:50:07 +0100

I'm working on a back-ported dm-cache version for kernel 2.6.32-431.29.2 (the CentOS 6 patched one) and I'm trying to solve a corruption bug apparently introduced during the back-port. I can consistently reproduce it by simply mounting an ext4 file-system that contains some data and running stat(1) against a specific directory. stat(1) fails with "Input/output error" and dmesg says:"EXT4-fs error (device dm-8): ext4_lookup: deleted inode referenced: 29". The file-system is mounted with options "data="" in order to minimise noise.
Since I'm using the pass-through mode, my theory is that dm-cache:
(1) forwards the bio to the wrong device, and/or
(2) forwards the bio to the wrong location of the device (e,g, bio length and/or offset are wrong), and/or
(3) copies the wrong piece of data from a forwarded bio to the original bio, assuming it copies data in the first place (I don't know much about the device mapper at this point).

I tried to confirm which of the above 3 could be happening by checking which bio goes where using btrace(8). Specifically, I ran btrace(8) against the cache target, the HDD, the SSD data and metadata devices (4 traces in total). I observed that no bios go to the SSD data and metadata devices, so this rules out (1). I also observed that read-ahead requests issued to the cache target don't get forwarded to the HDD. I don't know whether or not this can be a problem in the first place (can read-ahead bios be ignored?), let alone identifying this being the problem, but I think it's worth it ensuring that it really doesn't cause any problems.

Below are the traces when mounting the file-system:

cache target trace:
253,8    7        3    27.218701098 28536  Q   R 2 + 2 [mount]
253,8    7        4    27.218726465 28536  U   N [mount] 0
253,8    7        5    27.222694538 28536  Q   R 0 + 8 [mount]
253,8    7        6    27.222707270 28536  U   N [mount] 0
253,8    7        7    27.226580397 28536  Q   R 8 + 8 [mount]
253,8    7        8    27.226598088 28536  U   N [mount] 0
253,8    7        9    27.229666137 28536  Q  RA 2832 + 8 [mount]
253,8    7       10    27.229677500 28536  Q  RM 2824 + 8 [mount]
253,8    7       11    27.229679348 28536  U   N [mount] 0
253,8    1        2    27.222630997 28198  C   R 2 + 2 [0]
253,8    1        3    27.226560799 28198  C   R 0 + 8 [0]
253,8    1        4    27.229570827 28198  C   R 8 + 8 [0]
253,8    1        5    27.232313463 28198  C  RM 2824 + 8 [0]
253,8    3        1    27.229683980 28291  C  RA 2832 + 8 [0]
253,8    7       12    27.232360573 28536  Q   R 4456448 + 8 [mount]
253,8    7       13    27.232402040 28536  U   N [mount] 0
253,8    6        1    27.243263044 28204  C   R 4456448 + 8 [0]

HDD trace:
253,5    1        3    27.222584291 28198  C   R 2 + 2 [0]
253,5    7        2    27.218685545 28536  U   N [(null)] 0
253,5    7        3    27.222664774 28536  U   N [(null)] 0
253,5    3        1    27.218694575 28291  Q   R 2 + 2 [dm-cache]
253,5    3        2    27.222670216 28291  Q   R 0 + 8 [dm-cache]
253,5    3        3    27.226566647 28291  Q   R 8 + 8 [dm-cache]
253,5    1        4    27.226516192 28198  C   R 0 + 8 [0]
253,5    1        5    27.229526352 28198  C   R 8 + 8 [0]
253,5    1        6    27.232269516 28198  C  RM 2824 + 8 [0]
253,5    7        4    27.226555641 28536  U   N [(null)] 0
253,5    7        5    27.229636877 28536  U   N [(null)] 0
253,5    7        6    27.232331776 28536  A   R 4456448 + 8 <- (253,8) 4456448
253,5    7        7    27.232332898 28536  Q   R 4456448 + 8 [(null)]
253,5    7        8    27.232359557 28536  U   N [(null)] 0
253,5    3        4    27.229649990 28291  Q  RM 2824 + 8 [dm-cache]
253,5    6        1    27.243215063 28204  C   R 4456448 + 8 [0]

The "RA 2832 + 8" request (7th line in the 1st trace) issued to the cache target gets completed without ever reaching the HDD. Is this OK? I've started looking at the code but I haven't found yet anything specific to read-ahead bios.

Regarding my 3rd theory (data getting corrupted by dm-cache after read from the HDD), is there some relatively easy way to confirm this? E.g. could btrace tell me when a bio completes the checksum of the bio's data?

Is there something else that could be wrong?
--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel