Having just upgraded from a 4.11 kernel to a 4.13 one, I see a significantly higher scrub time for a ZFS on Linux (=ZoL) pool that lives on a dm-cache device consisting of a 800 GB partition on one spinning 1TB disk and one partition on an SDD (something between 100 and 200 GB). ZFS scrubbing consists of reading everything stored in the pool from start to finish, roughly in the order that it was written. The data on the pool is for the most part more or less linear, and the scrubbing used to achieve read rates from the spinning disk in excess of 100MB/sec. With the old kernel, that is. These are the scrub times for both kernels: 4.11.5-300.fc26: 1h56m 4.13.9-200.fc26: 4h32m Nothing changed between those two runs except for the booted kernel. ZoL is version 0.7.3 in both cases. Originally, I suspected ZoL 0.7.x to be the culprit, which I upgraded simultaneously to the kernel, from 0.6.5.11. However, I built and installed it for both kernel versions from the exact same sources, and scrub times are comparable to what they were before on my home system which uses ZoL on four spinning disks without an interposed dm-cache. Typical output for iostat -dmx 3 with kernel 4.13 while scrub is going on. Otherwise, there is no I/O activity on the system: Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 300.67 0.00 462.67 0.00 68.16 0.00 301.69 2.63 5.61 5.61 0.00 2.16 99.90 sdb 0.00 194.67 6.00 83.33 0.38 14.01 329.82 0.20 2.22 0.50 2.34 1.58 14.13 dm-0 0.00 0.00 6.00 221.33 0.38 13.83 128.01 0.54 2.38 0.50 2.43 0.29 6.63 dm-1 0.00 0.00 0.00 53.67 0.00 0.17 6.31 0.12 2.28 0.00 2.28 2.06 11.07 dm-2 0.00 0.00 763.33 0.00 68.16 0.00 182.86 8.05 10.49 10.49 0.00 1.31 99.93 dm-3 0.00 0.00 440.00 0.00 54.70 0.00 254.60 1.98 4.41 4.41 0.00 2.27 100.03 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 468.00 1.00 519.67 20.00 82.39 0.24 313.60 2.93 5.38 5.49 2.50 1.83 98.63 sdb 0.00 356.00 18.67 109.33 1.00 25.80 428.73 0.15 1.20 1.20 1.20 1.04 13.33 dm-0 0.00 0.00 18.67 426.00 1.00 25.75 123.20 0.52 1.16 1.20 1.16 0.19 8.33 dm-1 0.00 0.00 0.00 39.67 0.00 0.13 6.66 0.06 1.52 0.00 1.52 1.43 5.67 dm-2 0.00 0.00 988.00 21.00 82.68 0.24 168.31 9.63 8.97 9.11 2.38 0.98 98.60 dm-3 0.00 0.00 485.00 19.33 57.84 0.24 235.88 2.14 4.29 4.41 1.41 1.98 99.87 dm-3 is the cached device which ZoL reads from. sda/dm-2 is the spinning disk, sdb/dm-0 is the cache SDD. It strikes me as odd that the amount read from the spinning disk is actually more than what comes out of the combined device in the end. It is exactly the other way around with the older kernel, which makes much more sense to me. It looks like this with 4.11, where the resulting amount of data is the sum of both reads: Typical samples with kernel 4.11: Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 87.67 0.00 618.33 0.00 62.53 0.00 207.12 1.58 2.56 2.56 0.00 1.36 84.37 sdb 0.67 0.00 1057.00 0.00 86.96 0.00 168.49 0.44 0.41 0.41 0.00 0.23 24.37 dm-0 0.00 0.00 1057.67 0.00 86.96 0.00 168.38 0.44 0.42 0.42 0.00 0.23 24.40 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 706.00 0.00 62.56 0.00 181.48 1.74 2.46 2.46 0.00 1.19 84.33 dm-3 0.00 0.00 1488.33 0.00 149.52 0.00 205.74 1.97 1.32 1.32 0.00 0.67 100.00 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 165.33 0.00 747.33 0.00 91.42 0.00 250.52 1.70 2.27 2.27 0.00 1.14 85.37 sdb 0.00 0.00 746.33 0.00 64.54 0.00 177.09 0.36 0.49 0.49 0.00 0.23 17.00 dm-0 0.00 0.00 746.33 0.00 64.54 0.00 177.09 0.37 0.49 0.49 0.00 0.23 17.07 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 912.67 0.00 91.39 0.00 205.07 2.02 2.21 2.21 0.00 0.94 85.37 dm-3 0.00 0.00 1363.00 0.00 155.92 0.00 234.28 2.02 1.48 1.48 0.00 0.73 100.00 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 161.00 0.00 684.67 0.00 84.63 0.00 253.14 1.93 2.83 2.83 0.00 1.45 99.27 sdb 0.00 0.00 62.67 0.00 6.05 0.00 197.57 0.03 0.48 0.48 0.00 0.32 2.03 dm-0 0.00 0.00 62.67 0.00 6.05 0.00 197.57 0.03 0.48 0.48 0.00 0.32 2.03 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 845.67 0.00 84.63 0.00 204.94 2.26 2.68 2.68 0.00 1.17 99.30 dm-3 0.00 0.00 727.67 0.00 90.67 0.00 255.19 1.97 2.70 2.70 0.00 1.37 100.00 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util sda 160.67 0.00 738.67 0.00 89.74 0.00 248.81 1.83 2.48 2.48 0.00 1.29 95.23 sdb 0.33 0.00 303.33 0.00 28.02 0.00 189.17 0.14 0.47 0.47 0.00 0.25 7.73 dm-0 0.00 0.00 303.67 0.00 28.02 0.00 188.96 0.14 0.47 0.47 0.00 0.26 7.87 dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-2 0.00 0.00 899.33 0.00 89.74 0.00 204.36 2.17 2.41 2.41 0.00 1.06 95.27 dm-3 0.00 0.00 978.67 0.00 117.76 0.00 246.42 1.96 2.00 2.00 0.00 1.02 100.00 $ ls -l /dev/mapper/ total 0 crw------- 1 root root 10, 236 Nov 3 09:08 control lrwxrwxrwx 1 root root 7 Nov 3 09:08 vg_zfs-lv_cachedata_cdata -> ../dm-0 lrwxrwxrwx 1 root root 7 Nov 3 09:08 vg_zfs-lv_cachedata_cmeta -> ../dm-1 lrwxrwxrwx 1 root root 7 Nov 3 09:08 vg_zfs-lv_zfsdisk -> ../dm-3 lrwxrwxrwx 1 root root 7 Nov 3 09:08 vg_zfs-lv_zfsdisk_corig -> ../dm-2 $ sudo dmsetup ls --tree vg_zfs-lv_zfsdisk (253:3) ├─vg_zfs-lv_zfsdisk_corig (253:2) │ └─ (8:6) ├─vg_zfs-lv_cachedata_cdata (253:0) │ └─ (8:21) └─vg_zfs-lv_cachedata_cmeta (253:1) └─ (8:21) $ sudo dmsetup table vg_zfs-lv_zfsdisk 0 1876041728 cache 253:1 253:0 253:2 1024 1 writethrough smq 0 $ sudo dmsetup status /dev/mapper/vg_zfs-lv_zfsdisk 0 1876041728 cache 8 1296/54272 1024 430706/430720 91621106 163624489 32345201 16417931 307686 307668 0 1 writethrough 2 migration_threshold 2048 smq 0 rw - Any ideas? -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel