On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang <haomai@xxxxxxxx> wrote: > On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas <florian@xxxxxxxxxxx> wrote: >> Hello everyone, >> >> I'm trying to get a handle on the current state of the async messenger's >> RDMA transport in Luminous, and I've noticed that the information >> available is a little bit sparse (I've found >> https://community.mellanox.com/docs/DOC-2693 and >> https://community.mellanox.com/docs/DOC-2721, which are a great start >> but don't look very complete). So I'm kicking off this thread that might >> hopefully bring interested parties and developers together. >> >> Could someone in the know please confirm that the following assumptions >> of mine are accurate: >> >> - RDMA support for the async messenger is available in Luminous. > > to be precious, rdma in luminous is available but lack of memory > control when under pressure. it would be ok to run for test purpose. OK, thanks! Assuming async+rdma will become fully supported some time in the next release or two, are there plans to backport async+rdma related features to Luminous? Or will users likely need to wait for the next release to get a production-grade Ceph/RDMA stack? >> - You enable it globally by setting ms_type to "async+rdma", and by >> setting appropriate values for the various ms_async_rdma* options (most >> importantly, ms_async_rdma_device_name). >> >> - You can also set RDMA messaging just for the public or cluster >> network, via ms_public_type and ms_cluster_type. >> >> - Users have to make a global async+rdma vs. async+posix decision on >> either network. For example, if either ms_type or ms_public_type is >> configured to async+rdma on cluster nodes, then a client configured with >> ms_type = async+posix can't communicate. >> >> Based on those assumptions, I have the following questions: >> >> - What is the current state of RDMA support in kernel libceph? In other >> words, is there currently a way to map RBDs, or mount CephFS, if a Ceph >> cluster uses RDMA messaging? > > no planning on kernel side so far. rbd-nbd, cephfs-fuse should be supported now. Understood — are there plans to support async+rdma in the kernel at all, or is there something in the kernel that precludes this? >> - In case there is no such support in the kernel yet: What's the current >> status of RDMA support (and testing) with regard to >> * libcephfs? > > libcephfs should be ok, but mds has some potential problems but no > verified recently, because it uses some different and tricky messenger > methods. I'm not sure it still exists. > >> * the Samba Ceph VFS? > > no testing > >> * nfs-ganesha? > > no testing > >> * tcmu-runner? > > I have received other user report that tcum-runner has conflicting > with ibverbs deps, netlink library version > >> >> - In summary, if a user wants to access their Ceph cluster via a POSIX >> filesystem or via iSCSI, is enabling the RDMA-enabled async messenger in >> the public network an option? Or would they have to continue running on >> TCP/IP (possibly on IPoIB if they already have InfiniBand hardware) >> until the client libraries catch up? > > any try is welcomed. OK. But for now, would you agree that *production* systems with IB HCAs should use IPoIB, and async+posix? >> - And more broadly, if a user wants to use the performance benefits of >> RDMA, but not all of their potential Ceph clients have InfiniBand HCAs, >> what are their options? RoCE? > > roce v2 is supported Thanks! Cheers, Florian _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com