Dear devs, I'm posting on ceph-devel because I didn't get any feedback on ceph-users. This is an act of desperation… TL;DR: Cluster runs good with Kernel 4.13, produces slow_requests with Kernel 4.15. How to debug? I'm running a combined Ceph / KVM cluster consisting of 6 hosts of 2 different kinds (details at the end). The main difference between those hosts is CPU generation (Westmere / Sandybridge), and number of OSD disks. The cluster runs Proxmox 5.2 which essentially is a Debian 9 but using Ubuntu kernels and the Proxmox virtualization framework. The Proxmox WebUI also integrates some kind of Ceph management. On the Ceph side, the cluster has 3 nodes that run MGR, MON and OSDs while the other 3 only run OSDs. OSD tree and CRUSH map are at the end. Ceph version is 12.2.7. All OSDs are BlueStore. Now here's the thing: Some weeks ago Proxmox upgraded from kernel 4.13 to 4.15. Since then I'm getting slow requests that cause blocked IO inside the VMs that are running on the cluster (but not necessarily on the host with the OSD causing the slow request). If I boot back into 4.13 then Ceph runs smoothly again. I'm seeking for help to debug this issue as I'm running out of ideas what I could else do. So far I was using "ceph daemon osd.X dump_blocked_ops"to diagnose which always indicates that the primary OSD scheduled copies on two secondaries (e.g. OSD 15: "event": "waiting for subops from 9,23") but only one of those succeeds ("event": "sub_op_commit_rec from 23"). The other one blocks (there is no commit message from OSD 9). On OSD 9 there is no blocked operation ("num_blocked_ops": 0) which confuses me a lot. If this OSD does not commit there should be an operation that does not succeed, should it not? Restarting the (primary) OSD with the blocked operation clears the error, restarting the secondary OSD that does not commit has no effect on the issue. Any ideas on how to debug this further? What should I do to identify this as a Ceph issue and not a networking or kernel issue? I can provide more specific info if needed. Thanks, Uwe #### Hardware details #### Host type 1: CPU: 2x Intel Xeon E5-2670 RAM: 64GiB Storage: 1x SSD for OS, 3x HDD for Ceph (232GiB, some replaced by 931GiB) connected NIC: 1x 1GbE Intel (management access, MTU 1500), 1x 10GbE Myricom (Ceph & KVM, MTU 9000) Host type 2: CPU: 2x Intel Xeon E5606 RAM: 96GiB Storage: 1x HDD for OS, 5x HDD for Ceph (465GiB, some replaced by 931GiB) connected NIC: 1x 1GbE Intel (management access, MTU 1500), 1x 10GbE Myricom (Ceph & KVM, MTU 9000) #### End Hardware #### #### Ceph OSD Tree #### ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 12.72653 root default -2 1.36418 host px-alpha-cluster 0 hdd 0.22729 osd.0 up 1.00000 1.00000 1 hdd 0.22729 osd.1 up 1.00000 1.00000 2 hdd 0.90959 osd.2 up 1.00000 1.00000 -3 1.36418 host px-bravo-cluster 3 hdd 0.22729 osd.3 up 1.00000 1.00000 4 hdd 0.22729 osd.4 up 1.00000 1.00000 5 hdd 0.90959 osd.5 up 1.00000 1.00000 -4 2.04648 host px-charlie-cluster 6 hdd 0.90959 osd.6 up 1.00000 1.00000 7 hdd 0.22729 osd.7 up 1.00000 1.00000 8 hdd 0.90959 osd.8 up 1.00000 1.00000 -5 2.04648 host px-delta-cluster 9 hdd 0.22729 osd.9 up 1.00000 1.00000 10 hdd 0.90959 osd.10 up 1.00000 1.00000 11 hdd 0.90959 osd.11 up 1.00000 1.00000 -11 2.72516 host px-echo-cluster 12 hdd 0.45419 osd.12 up 1.00000 1.00000 13 hdd 0.45419 osd.13 up 1.00000 1.00000 14 hdd 0.45419 osd.14 up 1.00000 1.00000 15 hdd 0.45419 osd.15 up 1.00000 1.00000 16 hdd 0.45419 osd.16 up 1.00000 1.00000 17 hdd 0.45419 osd.17 up 1.00000 1.00000 -13 3.18005 host px-foxtrott-cluster 18 hdd 0.45419 osd.18 up 1.00000 1.00000 19 hdd 0.45419 osd.19 up 1.00000 1.00000 20 hdd 0.45419 osd.20 up 1.00000 1.00000 21 hdd 0.90909 osd.21 up 1.00000 1.00000 22 hdd 0.45419 osd.22 up 1.00000 1.00000 23 hdd 0.45419 osd.23 up 1.00000 1.00000 #### End OSD Tree #### #### CRUSH map #### # begin crush map tunable choose_local_tries 0 tunable choose_local_fallback_tries 0 tunable choose_total_tries 50 tunable chooseleaf_descend_once 1 tunable chooseleaf_vary_r 1 tunable chooseleaf_stable 1 tunable straw_calc_version 1 tunable allowed_bucket_algs 54 # devices device 0 osd.0 class hdd device 1 osd.1 class hdd device 2 osd.2 class hdd device 3 osd.3 class hdd device 4 osd.4 class hdd device 5 osd.5 class hdd device 6 osd.6 class hdd device 7 osd.7 class hdd device 8 osd.8 class hdd device 9 osd.9 class hdd device 10 osd.10 class hdd device 11 osd.11 class hdd device 12 osd.12 class hdd device 13 osd.13 class hdd device 14 osd.14 class hdd device 15 osd.15 class hdd device 16 osd.16 class hdd device 17 osd.17 class hdd device 18 osd.18 class hdd device 19 osd.19 class hdd device 20 osd.20 class hdd device 21 osd.21 class hdd device 22 osd.22 class hdd device 23 osd.23 class hdd # types type 0 osd type 1 host type 2 chassis type 3 rack type 4 row type 5 pdu type 6 pod type 7 room type 8 datacenter type 9 region type 10 root # buckets host px-alpha-cluster { id -2 # do not change unnecessarily id -6 class hdd # do not change unnecessarily # weight 1.364 alg straw hash 0 # rjenkins1 item osd.0 weight 0.227 item osd.1 weight 0.227 item osd.2 weight 0.910 } host px-bravo-cluster { id -3 # do not change unnecessarily id -7 class hdd # do not change unnecessarily # weight 1.364 alg straw hash 0 # rjenkins1 item osd.3 weight 0.227 item osd.4 weight 0.227 item osd.5 weight 0.910 } host px-charlie-cluster { id -4 # do not change unnecessarily id -8 class hdd # do not change unnecessarily # weight 2.046 alg straw hash 0 # rjenkins1 item osd.7 weight 0.227 item osd.8 weight 0.910 item osd.6 weight 0.910 } host px-delta-cluster { id -5 # do not change unnecessarily id -9 class hdd # do not change unnecessarily # weight 2.046 alg straw hash 0 # rjenkins1 item osd.9 weight 0.227 item osd.10 weight 0.910 item osd.11 weight 0.910 } host px-echo-cluster { id -11 # do not change unnecessarily id -12 class hdd # do not change unnecessarily # weight 2.725 alg straw2 hash 0 # rjenkins1 item osd.12 weight 0.454 item osd.13 weight 0.454 item osd.14 weight 0.454 item osd.16 weight 0.454 item osd.17 weight 0.454 item osd.15 weight 0.454 } host px-foxtrott-cluster { id -13 # do not change unnecessarily id -14 class hdd # do not change unnecessarily # weight 3.180 alg straw2 hash 0 # rjenkins1 item osd.18 weight 0.454 item osd.19 weight 0.454 item osd.20 weight 0.454 item osd.22 weight 0.454 item osd.23 weight 0.454 item osd.21 weight 0.909 } root default { id -1 # do not change unnecessarily id -10 class hdd # do not change unnecessarily # weight 12.727 alg straw hash 0 # rjenkins1 item px-alpha-cluster weight 1.364 item px-bravo-cluster weight 1.364 item px-charlie-cluster weight 2.046 item px-delta-cluster weight 2.046 item px-echo-cluster weight 2.725 item px-foxtrott-cluster weight 3.180 } # rules rule replicated_ruleset { id 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } # end crush map #### End CRUSH ####