Hi Alex, Currently RBD+LIO+ESX is broken. The problem is caused by the RBD device not handling device aborts properly causing LIO and ESXi to enter a death spiral together. If something in the Ceph cluster causes an IO to take longer than 10 seconds(I think!!!) ESXi submits an iSCSI abort message. Once this happens, as you have seen it never recovers. Mike Christie from Redhat is doing a lot of work on this currently, so hopefully in the future there will be a direct RBD interface into LIO and it will all work much better. Either tgt or SCST seem to be pretty stable in testing. Nick > -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Alex Gorbachev > Sent: 23 August 2015 02:17 > To: ceph-users <ceph-users@xxxxxxxxxxxxxx> > Subject: Slow responding OSDs are not OUTed and cause RBD > client IO hangs > > Hello, this is an issue we have been suffering from and researching along > with a good number of other Ceph users, as evidenced by the recent posts. > In our specific case, these issues manifest themselves in a RBD -> iSCSI LIO -> > ESXi configuration, but the problem is more general. > > When there is an issue on OSD nodes (examples: network hangs/blips, disk > HBAs failing, driver issues, page cache/XFS issues), some OSDs respond > slowly or with significant delays. ceph osd perf does not show this, neither > does ceph osd tree, ceph -s / ceph -w. Instead, the RBD IO hangs to a point > where the client times out, crashes or displays other unsavory behavior - > operationally this crashes production processes. > > Today in our lab we had a disk controller issue, which brought an OSD node > down. Upon restart, the OSDs started up and rejoined into the cluster. > However, immediately all IOs started hanging for a long time and aborts from > ESXi -> LIO were not succeeding in canceling these IOs. The only warning I > could see was: > > root@lab2-mon1:/var/log/ceph# ceph health detail HEALTH_WARN 30 > requests are blocked > 32 sec; > 1 osds have slow requests 30 ops are blocked > 2097.15 sec > 30 ops are blocked > 2097.15 sec on osd.4 > 1 osds have slow requests > > However, ceph osd perf is not showing high latency on osd 4: > > root@lab2-mon1:/var/log/ceph# ceph osd perf osd fs_commit_latency(ms) > fs_apply_latency(ms) > 0 0 13 > 1 0 0 > 2 0 0 > 3 172 208 > 4 0 0 > 5 0 0 > 6 0 1 > 7 0 0 > 8 174 819 > 9 6 10 > 10 0 1 > 11 0 1 > 12 3 5 > 13 0 1 > 14 7 23 > 15 0 1 > 16 0 0 > 17 5 9 > 18 0 1 > 19 10 18 > 20 0 0 > 21 0 0 > 22 0 1 > 23 5 10 > > SMART state for osd 4 disk is OK. The OSD in up and in: > > root@lab2-mon1:/var/log/ceph# ceph osd tree > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -8 0 root ssd > -7 14.71997 root platter > -3 7.12000 host croc3 > 22 0.89000 osd.22 up 1.00000 1.00000 > 15 0.89000 osd.15 up 1.00000 1.00000 > 16 0.89000 osd.16 up 1.00000 1.00000 > 13 0.89000 osd.13 up 1.00000 1.00000 > 18 0.89000 osd.18 up 1.00000 1.00000 > 8 0.89000 osd.8 up 1.00000 1.00000 > 11 0.89000 osd.11 up 1.00000 1.00000 > 20 0.89000 osd.20 up 1.00000 1.00000 > -4 0.47998 host croc2 > 10 0.06000 osd.10 up 1.00000 1.00000 > 12 0.06000 osd.12 up 1.00000 1.00000 > 14 0.06000 osd.14 up 1.00000 1.00000 > 17 0.06000 osd.17 up 1.00000 1.00000 > 19 0.06000 osd.19 up 1.00000 1.00000 > 21 0.06000 osd.21 up 1.00000 1.00000 > 9 0.06000 osd.9 up 1.00000 1.00000 > 23 0.06000 osd.23 up 1.00000 1.00000 > -2 7.12000 host croc1 > 7 0.89000 osd.7 up 1.00000 1.00000 > 2 0.89000 osd.2 up 1.00000 1.00000 > 6 0.89000 osd.6 up 1.00000 1.00000 > 1 0.89000 osd.1 up 1.00000 1.00000 > 5 0.89000 osd.5 up 1.00000 1.00000 > 0 0.89000 osd.0 up 1.00000 1.00000 > 4 0.89000 osd.4 up 1.00000 1.00000 > 3 0.89000 osd.3 up 1.00000 1.00000 > > How can we proactively detect this condition? Is there anything I can run > that will output all slow OSDs? > > Regards, > Alex > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com