Re: RDMA/RoCE enablement failed with (113) No route to host

"Vitaliy Filippov" <vitalif@xxxxxxxxxx> · Sun, 10 Feb 2019 00:06:06 +0300

Hi Roman,

We recently discussed your tests and a simple idea came to my mind - can  
you repeat your tests targeting latency instead of max throughput? I mean  
just use iodepth=1. What the latency is and on what hardware?

Well, I am playing with ceph rdma implementation quite a while
and it has unsolved problems, thus I would say the status is
"not completely broken", but "you can run it on your own risk
and smile":

1. On disconnect of previously active (high write load) connection
    there is a race that can lead to osd (or any receiver) crash:

    https://github.com/ceph/ceph/pull/25447

2. Recent qlogic hardware (qedr drivers) does not support
    IBV_EVENT_QP_LAST_WQE_REACHED, which is used in ceph rdma
    implementation, pull request from 1. also targets this
    incompatibility.

3. On high write load and many connections there is a chance,
    that osd can run out of receive WRs and rdma connection (QP)
    on sender side will get IBV_WC_RETRY_EXC_ERR, thus disconnected.
    This is fundamental design problem, which has to be fixed on
    protocol level (e.g. propagate backpressure to senders).

4. Unfortunately neither rdma or any other 0-latency network can
    bring significant value, because the bottle neck is not a
    network, please consider this for further reading regarding
    transport performance in ceph:

    https://www.spinics.net/lists/ceph-devel/msg43555.html

    Problems described above have quite a big impact on overall
    transport performance.

--
Roman

--
With best regards,
  Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com