@Haomai,
Does HAVE_IBV_EXP still work with any RNIC in current Ceph repository?
@Nasution:
I have never used below options yet
ms_async_rdma_roce_ver = 0 #RoCEv1, all nodes with same networks. Should I use RoCEv2?
ms_async_rdma_local_gid = fe80:0000:0000:0000:****:****:****:**** #should I use
0000:0000:0000:0000:0000 :****:****:**** one?
To use RDMA, you may need:
1) configure “ulimit -l” to be unlimited
2) For RNIC with SRQ function:
a. below configuration should be OK
ms_async_rdma_device_name = mlx5_bond_0
ms_cluster_type = async+rdma
ms_public_type = async+posix
b. If you need to different RoCEv1 or RoCEv2, you need to configure “ms_async_rdma_gid_idx”
Reference: https://github.com/ceph/ceph/pull/31517/commits/b971cff51a9179c02f85a27cc191731a18e39876
From: Lazuardi Nasution <mrxlazuardin@xxxxxxxxx>
Sent: Thursday, September 10, 2020 12:23 AM
To: Liu, Changcheng <changcheng.liu@xxxxxxxxx>
Subject: Ceph with RDMA
Hi,
I'm reading your post regarding Ceph with RDMA. Have you solved your problem? I'm trying the same way, but currently I'm facing a problem that some OSDs are automatically down not so long after it up due to no heartbeat reply, even for
the newly installed cluster. I'm using the following RDMA related configuration.
[global]
.......
ms_async_rdma_device_name = mlx5_bond_0
ms_cluster_type = async+rdma
ms_public_type = async+posix
#/rbd does not support rdma
ms_async_rdma_polling_us = 0
ms_async_rdma_roce_ver = 0 #RoCEv1, all nodes with same networks. Should I use RoCEv2?
ms_async_rdma_local_gid = fe80:0000:0000:0000:****:****:****:**** #should I use
0000:0000:0000:0000:0000 :****:****:**** one?
[mgr]
ms_type = async+posix
I have put "LimitMEMLOCK on OSD (because it is the only one that failed to start without it) systemd unit file. "Would you mind sharing your configuration of working Ceph with RDMA? Do I miss something?
|
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx