We've used RDMA via RoCEv2 on 100GbE. It ran in production that way for at least 6 months before I had to turn it off when doing some migrations using hardware that didn't support it. We noticed no performance change in our environment so once we were done I just never turned it back on. I'm not even sure we could right now with how we have our network topology / bond interfaces The biggest annoyance was making sure the device name and gid were correct. This was before the ceph config stuff existed so it may be easier now to roll that one out. Example config section for one of my nodes (in the global part under public+cluster network): ms_cluster_type = async+rdma ms_async_rdma_device_name = mlx5_1 ms_async_rdma_polling_us = 0 ms_async_rdma_local_gid = 0000:0000:0000:0000:0000:ffff:c1b8:4fa0 ms_async_rdma_roce_ver = 1 We pulled the GID in ansible with: - name: "Insert RDMA GID into ceph.conf" shell: sed -i s/GIDGOESHERE/$(cat /sys/class/infiniband/mlx5_1/ports/1/gids/5)/g /etc/ceph/ceph.conf args: warn: no The stub config file we pushed had "GIDGOESHERE" in it. I hope that helps someone out there. Not all of the settings were obvious and it took some trial and error. Now that we have a pure NVMe tier I'll probably try and turn it back on to see if we notice any changes. Netdata also proved to be a valuable tool to make sure we had traffic in both TCP and RDMA https://www.netdata.cloud/ -- Paul Mezzanini Sr Systems Administrator / Engineer, Research Computing Information & Technology Services Finance & Administration Rochester Institute of Technology o:(585) 475-3245 | pfmeec@xxxxxxx CONFIDENTIALITY NOTE: The information transmitted, including attachments, is intended only for the person(s) or entity to which it is addressed and may contain confidential and/or privileged material. Any review, retransmission, dissemination or other use of, or taking of any action in reliance upon this information by persons or entities other than the intended recipient is prohibited. If you received this in error, please contact the sender and destroy any copies of this information. ------------------------ ________________________________________ From: Andrei Mikhailovsky <andrei@xxxxxxxxxx> Sent: Wednesday, August 26, 2020 5:55 PM To: Rafael Quaglio Cc: ceph-users Subject: Re: Infiniband support Rafael, We've been using ceph with ipoib for over 7 years and it's been supported. However, I am not too sure of the the native rdma support. There has been discussions on/off for a while now, but I've not seen much. Perhaps others know. Cheers > From: "Rafael Quaglio" <quaglio@xxxxxxxxxx> > To: "ceph-users" <ceph-users@xxxxxxx> > Sent: Wednesday, 26 August, 2020 14:08:57 > Subject: Infiniband support > Hi, > I could not see in the doc if Ceph has infiniband support. Is there someone > using it? > Also, is there any rdma support working natively? > Can anyoune point me where to find more information about it? > Thanks, > Rafael. > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx