Re: State of play for RDMA on Luminous

Haomai Wang <haomai@xxxxxxxx> · Mon, 28 Aug 2017 07:21:02 -0700

On Wed, Aug 23, 2017 at 1:26 AM, Florian Haas <florian@xxxxxxxxxxx> wrote:
> Hello everyone,
>
> I'm trying to get a handle on the current state of the async messenger's
> RDMA transport in Luminous, and I've noticed that the information
> available is a little bit sparse (I've found
> https://community.mellanox.com/docs/DOC-2693 and
> https://community.mellanox.com/docs/DOC-2721, which are a great start
> but don't look very complete). So I'm kicking off this thread that might
> hopefully bring interested parties and developers together.
>
> Could someone in the know please confirm that the following assumptions
> of mine are accurate:
>
> - RDMA support for the async messenger is available in Luminous.

to be precious, rdma in luminous is available but lack of memory
control when under pressure. it would be ok to run for test purpose.

>
> - You enable it globally by setting ms_type to "async+rdma", and by
> setting appropriate values for the various ms_async_rdma* options (most
> importantly, ms_async_rdma_device_name).
>
> - You can also set RDMA messaging just for the public or cluster
> network, via ms_public_type and ms_cluster_type.
>
> - Users have to make a global async+rdma vs. async+posix decision on
> either network. For example, if either ms_type or ms_public_type is
> configured to async+rdma on cluster nodes, then a client configured with
> ms_type = async+posix can't communicate.
>
> Based on those assumptions, I have the following questions:
>
> - What is the current state of RDMA support in kernel libceph? In other
> words, is there currently a way to map RBDs, or mount CephFS, if a Ceph
> cluster uses RDMA messaging?

no planning on kernel side so far. rbd-nbd, cephfs-fuse should be supported now.

>
> - In case there is no such support in the kernel yet: What's the current
> status of RDMA support (and testing) with regard to
>   * libcephfs?

libcephfs should be ok, but mds has some potential problems but no
verified recently, because it uses some different and tricky messenger
methods. I'm not sure it still exists.

>   * the Samba Ceph VFS?

no testing

>   * nfs-ganesha?

no testing

>   * tcmu-runner?

I have received other user report that tcum-runner has conflicting
with ibverbs deps, netlink library version

>
> - In summary, if a user wants to access their Ceph cluster via a POSIX
> filesystem or via iSCSI, is enabling the RDMA-enabled async messenger in
> the public network an option? Or would they have to continue running on
> TCP/IP (possibly on IPoIB if they already have InfiniBand hardware)
> until the client libraries catch up?

any try is welcomed.

>
> - And more broadly, if a user wants to use the performance benefits of
> RDMA, but not all of their potential Ceph clients have InfiniBand HCAs,
> what are their options? RoCE?

roce v2 is supported

>
> Thanks very much in advance for everyone's insight!
>
> Cheers,
> Florian
>
>
> --
> Please feel free to verify my identity:
> https://keybase.io/fghaas
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com