Hello, this is an issue we have been suffering from and researching along with a good number of other Ceph users, as evidenced by the recent posts. In our specific case, these issues manifest themselves in a RBD -> iSCSI LIO -> ESXi configuration, but the problem is more general. When there is an issue on OSD nodes (examples: network hangs/blips, disk HBAs failing, driver issues, page cache/XFS issues), some OSDs respond slowly or with significant delays. ceph osd perf does not show this, neither does ceph osd tree, ceph -s / ceph -w. Instead, the RBD IO hangs to a point where the client times out, crashes or displays other unsavory behavior - operationally this crashes production processes. Today in our lab we had a disk controller issue, which brought an OSD node down. Upon restart, the OSDs started up and rejoined into the cluster. However, immediately all IOs started hanging for a long time and aborts from ESXi -> LIO were not succeeding in canceling these IOs. The only warning I could see was: root@lab2-mon1:/var/log/ceph# ceph health detail HEALTH_WARN 30 requests are blocked > 32 sec; 1 osds have slow requests 30 ops are blocked > 2097.15 sec 30 ops are blocked > 2097.15 sec on osd.4 1 osds have slow requests However, ceph osd perf is not showing high latency on osd 4: root@lab2-mon1:/var/log/ceph# ceph osd perf osd fs_commit_latency(ms) fs_apply_latency(ms) 0 0 13 1 0 0 2 0 0 3 172 208 4 0 0 5 0 0 6 0 1 7 0 0 8 174 819 9 6 10 10 0 1 11 0 1 12 3 5 13 0 1 14 7 23 15 0 1 16 0 0 17 5 9 18 0 1 19 10 18 20 0 0 21 0 0 22 0 1 23 5 10 SMART state for osd 4 disk is OK. The OSD in up and in: root@lab2-mon1:/var/log/ceph# ceph osd tree ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY -8 0 root ssd -7 14.71997 root platter -3 7.12000 host croc3 22 0.89000 osd.22 up 1.00000 1.00000 15 0.89000 osd.15 up 1.00000 1.00000 16 0.89000 osd.16 up 1.00000 1.00000 13 0.89000 osd.13 up 1.00000 1.00000 18 0.89000 osd.18 up 1.00000 1.00000 8 0.89000 osd.8 up 1.00000 1.00000 11 0.89000 osd.11 up 1.00000 1.00000 20 0.89000 osd.20 up 1.00000 1.00000 -4 0.47998 host croc2 10 0.06000 osd.10 up 1.00000 1.00000 12 0.06000 osd.12 up 1.00000 1.00000 14 0.06000 osd.14 up 1.00000 1.00000 17 0.06000 osd.17 up 1.00000 1.00000 19 0.06000 osd.19 up 1.00000 1.00000 21 0.06000 osd.21 up 1.00000 1.00000 9 0.06000 osd.9 up 1.00000 1.00000 23 0.06000 osd.23 up 1.00000 1.00000 -2 7.12000 host croc1 7 0.89000 osd.7 up 1.00000 1.00000 2 0.89000 osd.2 up 1.00000 1.00000 6 0.89000 osd.6 up 1.00000 1.00000 1 0.89000 osd.1 up 1.00000 1.00000 5 0.89000 osd.5 up 1.00000 1.00000 0 0.89000 osd.0 up 1.00000 1.00000 4 0.89000 osd.4 up 1.00000 1.00000 3 0.89000 osd.3 up 1.00000 1.00000 How can we proactively detect this condition? Is there anything I can run that will output all slow OSDs? Regards, Alex _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com