Re: rbd iscsi gateway question

Nick Fisk <nick@xxxxxxxxxx> · Thu, 6 Apr 2017 13:31:00 +0100

> -----Original Message-----
> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
> Maged Mokhtar
> Sent: 06 April 2017 12:21
> To: Brady Deetz <bdeetz@xxxxxxxxx>; ceph-users <ceph-users@xxxxxxxx>
> Subject: Re:  rbd iscsi gateway question
> 
> The io hang (it is actually a pause not hang) is done by Ceph only in case
of a
> simultaneous failure of 2 hosts or 2 osds on separate hosts. A single
host/osd
> being out will not cause this.  In PetaSAN project www.petasan.org we use
> LIO/krbd. We have done a lot of tests on VMWare, in case of io failure,
the io
> will block for approx 30s on the VMWare ESX (default timeout, but can be
> configured)  then it will resume on the other MPIO path.
> 
> We are using a custom LIO/kernel upstreamed from SLE 12 used in their
> enterprise storage offering, it supports direct rbd backstore. I believe
there
> was a request to include it mainstream kernel but it did not happen,
> probably waiting for TCMU solution which will be better/cleaner design.

Yes, should have mentioned this, if you are using the suse kernel, they have
a fix for this spiral of death problem. Any other distribution or vanilla
kernel, will hang if a Ceph IO takes longer than about 5-10s. It's the path
failure bit which is the problem, LIO tries to abort the IO, but RBD doesn't
support this yet.

> 
> Cheers /maged
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com