This can be tuned in the iSCSI initiation on VMware - look in advanced settings on your ESX hosts (at least if you use the software initiator). Jan > On 23 Aug 2015, at 21:28, Nick Fisk <nick@xxxxxxxxxx> wrote: > > Hi Alex, > > Currently RBD+LIO+ESX is broken. > > The problem is caused by the RBD device not handling device aborts properly > causing LIO and ESXi to enter a death spiral together. > > If something in the Ceph cluster causes an IO to take longer than 10 > seconds(I think!!!) ESXi submits an iSCSI abort message. Once this happens, > as you have seen it never recovers. > > Mike Christie from Redhat is doing a lot of work on this currently, so > hopefully in the future there will be a direct RBD interface into LIO and it > will all work much better. > > Either tgt or SCST seem to be pretty stable in testing. > > Nick > >> -----Original Message----- >> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of >> Alex Gorbachev >> Sent: 23 August 2015 02:17 >> To: ceph-users <ceph-users@xxxxxxxxxxxxxx> >> Subject: Slow responding OSDs are not OUTed and cause RBD >> client IO hangs >> >> Hello, this is an issue we have been suffering from and researching along >> with a good number of other Ceph users, as evidenced by the recent posts. >> In our specific case, these issues manifest themselves in a RBD -> iSCSI > LIO -> >> ESXi configuration, but the problem is more general. >> >> When there is an issue on OSD nodes (examples: network hangs/blips, disk >> HBAs failing, driver issues, page cache/XFS issues), some OSDs respond >> slowly or with significant delays. ceph osd perf does not show this, > neither >> does ceph osd tree, ceph -s / ceph -w. Instead, the RBD IO hangs to a > point >> where the client times out, crashes or displays other unsavory behavior - >> operationally this crashes production processes. >> >> Today in our lab we had a disk controller issue, which brought an OSD node >> down. Upon restart, the OSDs started up and rejoined into the cluster. >> However, immediately all IOs started hanging for a long time and aborts > from >> ESXi -> LIO were not succeeding in canceling these IOs. The only warning > I >> could see was: >> >> root@lab2-mon1:/var/log/ceph# ceph health detail HEALTH_WARN 30 >> requests are blocked > 32 sec; >> 1 osds have slow requests 30 ops are blocked > 2097.15 sec >> 30 ops are blocked > 2097.15 sec on osd.4 >> 1 osds have slow requests >> >> However, ceph osd perf is not showing high latency on osd 4: >> >> root@lab2-mon1:/var/log/ceph# ceph osd perf osd fs_commit_latency(ms) >> fs_apply_latency(ms) >> 0 0 13 >> 1 0 0 >> 2 0 0 >> 3 172 208 >> 4 0 0 >> 5 0 0 >> 6 0 1 >> 7 0 0 >> 8 174 819 >> 9 6 10 >> 10 0 1 >> 11 0 1 >> 12 3 5 >> 13 0 1 >> 14 7 23 >> 15 0 1 >> 16 0 0 >> 17 5 9 >> 18 0 1 >> 19 10 18 >> 20 0 0 >> 21 0 0 >> 22 0 1 >> 23 5 10 >> >> SMART state for osd 4 disk is OK. The OSD in up and in: >> >> root@lab2-mon1:/var/log/ceph# ceph osd tree >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -8 0 root ssd >> -7 14.71997 root platter >> -3 7.12000 host croc3 >> 22 0.89000 osd.22 up 1.00000 1.00000 >> 15 0.89000 osd.15 up 1.00000 1.00000 >> 16 0.89000 osd.16 up 1.00000 1.00000 >> 13 0.89000 osd.13 up 1.00000 1.00000 >> 18 0.89000 osd.18 up 1.00000 1.00000 >> 8 0.89000 osd.8 up 1.00000 1.00000 >> 11 0.89000 osd.11 up 1.00000 1.00000 >> 20 0.89000 osd.20 up 1.00000 1.00000 >> -4 0.47998 host croc2 >> 10 0.06000 osd.10 up 1.00000 1.00000 >> 12 0.06000 osd.12 up 1.00000 1.00000 >> 14 0.06000 osd.14 up 1.00000 1.00000 >> 17 0.06000 osd.17 up 1.00000 1.00000 >> 19 0.06000 osd.19 up 1.00000 1.00000 >> 21 0.06000 osd.21 up 1.00000 1.00000 >> 9 0.06000 osd.9 up 1.00000 1.00000 >> 23 0.06000 osd.23 up 1.00000 1.00000 >> -2 7.12000 host croc1 >> 7 0.89000 osd.7 up 1.00000 1.00000 >> 2 0.89000 osd.2 up 1.00000 1.00000 >> 6 0.89000 osd.6 up 1.00000 1.00000 >> 1 0.89000 osd.1 up 1.00000 1.00000 >> 5 0.89000 osd.5 up 1.00000 1.00000 >> 0 0.89000 osd.0 up 1.00000 1.00000 >> 4 0.89000 osd.4 up 1.00000 1.00000 >> 3 0.89000 osd.3 up 1.00000 1.00000 >> >> How can we proactively detect this condition? Is there anything I can run >> that will output all slow OSDs? >> >> Regards, >> Alex >> _______________________________________________ >> ceph-users mailing list >> ceph-users@xxxxxxxxxxxxxx >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com