On Mon, Aug 28, 2017 at 7:54 AM, Florian Haas <florian@xxxxxxxxxxx> wrote: > On Mon, Aug 28, 2017 at 4:21 PM, Haomai Wang <haomai@xxxxxxxx> wrote: >> On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas <florian@xxxxxxxxxxx> wrote: >>> Hello everyone, >>> >>> I'm trying to get a handle on the current state of the async messenger's >>> RDMA transport in Luminous, and I've noticed that the information >>> available is a little bit sparse (I've found >>> https://community.mellanox.com/docs/DOC-2693 and >>> https://community.mellanox.com/docs/DOC-2721, which are a great start >>> but don't look very complete). So I'm kicking off this thread that might >>> hopefully bring interested parties and developers together. >>> >>> Could someone in the know please confirm that the following assumptions >>> of mine are accurate: >>> >>> - RDMA support for the async messenger is available in Luminous. >> >> to be precious, rdma in luminous is available but lack of memory >> control when under pressure. it would be ok to run for test purpose. > > OK, thanks! Assuming async+rdma will become fully supported some time > in the next release or two, are there plans to backport async+rdma > related features to Luminous? Or will users likely need to wait for > the next release to get a production-grade Ceph/RDMA stack? I think so > >>> - You enable it globally by setting ms_type to "async+rdma", and by >>> setting appropriate values for the various ms_async_rdma* options (most >>> importantly, ms_async_rdma_device_name). >>> >>> - You can also set RDMA messaging just for the public or cluster >>> network, via ms_public_type and ms_cluster_type. >>> >>> - Users have to make a global async+rdma vs. async+posix decision on >>> either network. For example, if either ms_type or ms_public_type is >>> configured to async+rdma on cluster nodes, then a client configured with >>> ms_type = async+posix can't communicate. >>> >>> Based on those assumptions, I have the following questions: >>> >>> - What is the current state of RDMA support in kernel libceph? In other >>> words, is there currently a way to map RBDs, or mount CephFS, if a Ceph >>> cluster uses RDMA messaging? >> >> no planning on kernel side so far. rbd-nbd, cephfs-fuse should be supported now. > > Understood — are there plans to support async+rdma in the kernel at > all, or is there something in the kernel that precludes this? no. > >>> - In case there is no such support in the kernel yet: What's the current >>> status of RDMA support (and testing) with regard to >>> * libcephfs? >> >> libcephfs should be ok, but mds has some potential problems but no >> verified recently, because it uses some different and tricky messenger >> methods. I'm not sure it still exists. >> >>> * the Samba Ceph VFS? >> >> no testing >> >>> * nfs-ganesha? >> >> no testing >> >>> * tcmu-runner? >> >> I have received other user report that tcum-runner has conflicting >> with ibverbs deps, netlink library version >> >>> >>> - In summary, if a user wants to access their Ceph cluster via a POSIX >>> filesystem or via iSCSI, is enabling the RDMA-enabled async messenger in >>> the public network an option? Or would they have to continue running on >>> TCP/IP (possibly on IPoIB if they already have InfiniBand hardware) >>> until the client libraries catch up? >> >> any try is welcomed. > > OK. But for now, would you agree that *production* systems with IB > HCAs should use IPoIB, and async+posix? at this time, ipoib is prefer to use at production > >>> - And more broadly, if a user wants to use the performance benefits of >>> RDMA, but not all of their potential Ceph clients have InfiniBand HCAs, >>> what are their options? RoCE? >> >> roce v2 is supported > > Thanks! > > Cheers, > Florian > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com