Hi Roman,
We recently discussed your tests and a simple idea came to my mind - can
you repeat your tests targeting latency instead of max throughput? I mean
just use iodepth=1. What the latency is and on what hardware?
Well, I am playing with ceph rdma implementation quite a while
and it has unsolved problems, thus I would say the status is
"not completely broken", but "you can run it on your own risk
and smile":
1. On disconnect of previously active (high write load) connection
there is a race that can lead to osd (or any receiver) crash:
https://github.com/ceph/ceph/pull/25447
2. Recent qlogic hardware (qedr drivers) does not support
IBV_EVENT_QP_LAST_WQE_REACHED, which is used in ceph rdma
implementation, pull request from 1. also targets this
incompatibility.
3. On high write load and many connections there is a chance,
that osd can run out of receive WRs and rdma connection (QP)
on sender side will get IBV_WC_RETRY_EXC_ERR, thus disconnected.
This is fundamental design problem, which has to be fixed on
protocol level (e.g. propagate backpressure to senders).
4. Unfortunately neither rdma or any other 0-latency network can
bring significant value, because the bottle neck is not a
network, please consider this for further reading regarding
transport performance in ceph:
https://www.spinics.net/lists/ceph-devel/msg43555.html
Problems described above have quite a big impact on overall
transport performance.
--
Roman
--
With best regards,
Vitaliy Filippov
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com