Wow, does it really work?
And why is it not supported by RBD?
Can you show us the latency graphs before and after and tell the I/O
pattern to which the latency applies? Previous common knowledge was that
RDMA almost doesn't affect latency with Ceph, because most of the latency
is in Ceph itself.
Hello!
Mon, Oct 14, 2019 at 07:28:07AM -0000, gabryel.mason-williams wrote:
Hello,
I was wondering what user experience was with using Ceph over RDMA?
- How you set it up?
We had used RoCE Lag with Mellanox ConnectX-4 Lx.
- Documentation used to set it up?
Generally, Mellanox community docs and Ceph docs:
https://community.mellanox.com/s/article/bring-up-ceph-rdma---developer-s-guide
- Known issues when using it?
Ceph's distribution does not include Systemd units with
LimitMEMLOCK=infinity
setting. Also it was needed to start Ceph as root to workaround some
limits.
Ceph rbd clients, so as mgr daemons, do not suport rdma, so it was
needed to set
ms_cluster_type = async+rdma
ms_type = async+rdma
ms_public_type = async+posix
[mgr]
ms_type = async+posix
And we needed to disable any Jumbo Frames support in order to work with
RDMA.
- If you still use it?
As I can see on my graphs, it is latency drop with Nautilus+RDMA. As for
now,
cluster is up and running for 2 weeks without any issues and with our
production
load (rbd, radosgw, cephfs).
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
--
With best regards,
Vitaliy Filippov
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx