Good day! Please CC me on any responses as I'm not subscribed to this list. I think I'm seeing an interesting bottleneck in the dm-cache system and hoping that you guys can shed some light on it for me. I've got a dm-cache device setup with 3PAR storage as the origin device and a 1.2TB FusionIO card as the cache device running on CentOS 6.6 with kernel 2.6.32-504.el6.x86_64. The FIO is chopped up a bit with LVM and is not 100% dedicated to cache. Here's the `lvs` and `dmsetup status` output showing my setup: ** # lvs vgFIO LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert lv_cache vgFIO Cwi---C--- 864.24g mysql_cached vgFIO Cwi-aoC--- 6.00t lv_cache [mysql_cached_corig] tmp vgFIO -wi-ao---- 256.00g # dmsetup status vgFIO-mysql_cached: 0 12884803584 cache 8 28061/262144 128 14159744/14159744 19966771 45785332 1302688457 229213040 33859953 48019697 9150 1 writeback 2 migration_threshold 2048 mq 10 random_threshold 4 sequential_threshold 512 discard_promote_adjustment 1 read_promote_adjustment 4 write_promote_adjustment 8 vgFIO-mysql_cached_corig: 0 12884803584 linear vg3PAR-sasdata2: 0 21474803712 linear vgFIO-tmp: 0 536870912 linear 360002ac000000000000000040000c004: 0 12884901888 multipath 2 0 1 0 1 1 A 0 4 0 8:16 A 0 8:80 A 0 8:48 A 0 8:112 A 0 vgSlash-slash: 0 581238784 linear vgSlash-swap: 0 4194304 linear 360002ac000000000000000050000c004: 0 21474836480 multipath 2 0 1 0 1 1 A 0 4 0 8:32 A 0 8:96 A 0 8:64 A 0 8:128 A 0 vgFIO-lv_cache_cdata: 0 1812447232 linear vgFIO-lv_cache_cmeta: 0 2097152 linear ** The commands I used to create this are: ** lvcreate -L 1G -n lv_cache_meta vgFIO /dev/fioa lvcreate -l 221246 -n lv_cache vgFIO /dev/fioa lvcreate -l 1572852 -n mysql_cached vgFIO /dev/mapper/360002ac000000000000000040000c004 lvconvert --type cache-pool --poolmetadata vgFIO/lv_cache_meta --cachemode writeback vgFIO/lv_cache lvconvert --type cache --cachepool vgFIO/lv_cache vgFIO/mysql_cached ** The cached device is a mount for mysql to run on. Today mysql got very busy and I saw odd throughput with a potential bottleneck on the cache device cdata device. Given these device mappings: ** # ls -l /dev/mapper/ total 0 lrwxrwxrwx 1 root root 7 Jan 27 02:10 360002ac0000000000000000e0000bc99 -> ../dm-4 lrwxrwxrwx 1 root root 7 Jan 27 02:10 360002ac0000000000000000f0000bc99 -> ../dm-6 lrwxrwxrwx 1 root root 7 Jan 27 02:10 360002ac000000000000000100000bc99 -> ../dm-5 lrwxrwxrwx 1 root root 7 Jan 27 02:10 360002ac000000000000000110000bc99 -> ../dm-7 crw-rw---- 1 root root 10, 58 Dec 30 00:17 control lrwxrwxrwx 1 root root 7 Jan 2 22:03 vg3PAR-sasdata2 -> ../dm-8 lrwxrwxrwx 1 root root 8 Apr 22 18:07 vgFIO-lv_cache_cdata -> ../dm-10 lrwxrwxrwx 1 root root 8 Apr 22 18:07 vgFIO-lv_cache_cmeta -> ../dm-11 lrwxrwxrwx 1 root root 7 Apr 22 18:07 vgFIO-mysql_cached -> ../dm-9 lrwxrwxrwx 1 root root 8 Apr 22 18:07 vgFIO-mysql_cached_corig -> ../dm-12 lrwxrwxrwx 1 root root 7 Jan 2 22:03 vgFIO-tmp -> ../dm-3 lrwxrwxrwx 1 root root 7 Apr 22 15:21 vgSlash-slash2 -> ../dm-1 lrwxrwxrwx 1 root root 7 Jan 2 22:03 vgSlash-swap -> ../dm-0 ** The iostat output below (a representative second of output from `iostat -mx 1`) shows the top-level mounted device (dm-9) with very high utilization and the cache cdata device (dm-10) also with high utilization while all of the other devices remain rather idle. ** avg-cpu: %user %nice %system %iowait %steal %idle 3.12 0.00 1.79 3.99 0.00 91.10 Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await svctm %util sda 31.00 0.00 10.00 0.00 0.23 0.00 47.20 0.05 4.80 3.10 3.10 sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-1 0.00 0.00 42.00 0.00 0.23 0.00 11.24 0.08 1.90 0.74 3.10 fioa 0.00 0.00 415.00 516.00 25.89 6.20 70.58 0.00 0.73 0.00 0.00 dm-3 0.00 0.00 0.00 16.00 0.00 0.06 8.00 0.03 2.00 0.12 0.20 sdc 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdd 0.00 0.00 0.00 3.00 0.00 0.19 128.00 0.00 1.00 1.00 0.30 sde 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdf 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdg 0.00 0.00 0.00 3.00 0.00 0.19 128.00 0.00 0.67 0.67 0.20 sdh 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdi 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdj 0.00 0.00 0.00 4.00 0.00 0.25 128.00 0.00 1.00 1.00 0.40 sdk 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-4 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdl 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdm 0.00 0.00 0.00 3.00 0.00 0.19 128.00 0.00 1.00 1.00 0.30 sdn 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-6 0.00 0.00 0.00 13.00 0.00 0.81 128.00 0.01 0.92 0.77 1.00 dm-8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 sdo 0.00 0.00 16.00 100.00 1.03 6.20 127.66 0.08 0.69 0.69 8.00 sdp 0.00 0.00 17.00 99.00 1.12 6.14 128.28 0.08 0.66 0.66 7.60 sdq 0.00 0.00 14.00 103.00 1.26 6.44 134.77 0.09 0.74 0.73 8.50 sdr 0.00 0.00 16.00 101.00 1.30 6.31 133.33 0.09 0.74 0.74 8.70 dm-7 61.00 0.00 63.00 403.00 4.72 25.09 131.02 0.34 0.74 0.56 25.90 dm-9 0.00 0.00 125.00 887.00 4.73 6.16 22.04 2.82 2.79 0.96 97.50 dm-10 0.00 0.00 416.00 861.00 25.95 6.13 51.46 2.05 1.61 0.63 80.30 dm-11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 dm-12 0.00 0.00 124.00 416.00 4.72 25.91 116.15 0.40 0.75 0.49 26.30 ** Does the cache cdata device look like a bottleneck to you? Removing the cache with `lvremove vgFIO/lv_cache` resulted in a massive increase in throughput even before the cache finished flushing. Anyone have any tuning/debugging/troubleshooting steps they can suggest? Thanks, Greg
Attachment:
signature.asc
Description: OpenPGP digital signature
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel