The io hang (it is actually a pause not hang) is done by Ceph only in case
of a simultaneous failure of 2 hosts or 2 osds on separate hosts. A single
host/osd being out will not cause this. In PetaSAN project www.petasan.org
we use LIO/krbd. We have done a lot of tests on VMWare, in case of io
failure, the io will block for approx 30s on the VMWare ESX (default
timeout, but can be configured) then it will resume on the other MPIO path.
We are using a custom LIO/kernel upstreamed from SLE 12 used in their
enterprise storage offering, it supports direct rbd backstore. I believe
there was a request to include it mainstream kernel but it did not happen,
probably waiting for TCMU solution which will be better/cleaner design.
Cheers /maged
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com