> -----Original Message----- > From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of > Maged Mokhtar > Sent: 06 April 2017 12:21 > To: Brady Deetz <bdeetz@xxxxxxxxx>; ceph-users <ceph-users@xxxxxxxx> > Subject: Re: rbd iscsi gateway question > > The io hang (it is actually a pause not hang) is done by Ceph only in case of a > simultaneous failure of 2 hosts or 2 osds on separate hosts. A single host/osd > being out will not cause this. In PetaSAN project www.petasan.org we use > LIO/krbd. We have done a lot of tests on VMWare, in case of io failure, the io > will block for approx 30s on the VMWare ESX (default timeout, but can be > configured) then it will resume on the other MPIO path. > > We are using a custom LIO/kernel upstreamed from SLE 12 used in their > enterprise storage offering, it supports direct rbd backstore. I believe there > was a request to include it mainstream kernel but it did not happen, > probably waiting for TCMU solution which will be better/cleaner design. Yes, should have mentioned this, if you are using the suse kernel, they have a fix for this spiral of death problem. Any other distribution or vanilla kernel, will hang if a Ceph IO takes longer than about 5-10s. It's the path failure bit which is the problem, LIO tries to abort the IO, but RBD doesn't support this yet. > > Cheers /maged > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com