Hi Williams, Besides usign same port(both publid and cluster network use RDMA) for RDMA messenger, I also tried to use public-network-TCP-messenger and cluster-network-RDMA-messenger. There's no serious problem happen. The ceph is built by self based on master commit 8cb1f6bd(Wed Nov 6 18:43:41 2019 -0500). I don't have too many nodes to check your problem. BTW, on "ceph-users Digest, Vol 82, Issue 27", there's below item: 2. Re: mgr daemons becoming unresponsive (Gregory Farnum) However, I haven't hit mgr problem in my side. B.R. Changcheng On 08:53 Mon 11 Nov, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote: > @Changcheng > > Sorry for the late reply as well. > > I followed your setup and I have an issue where the MGR cannot connect > to the cluster and RDMA does not work, I believe the MGR is not > supported on RDMA. > > Thank you for your time but I believe we may be hitting a dead end with > this approach as we seem to get different results. > > Kind regards > > Gabryel Mason-Williams > __________________________________________________________________ > > From: Liu, Changcheng <changcheng.liu@xxxxxxxxx> > Sent: 01 November 2019 06:24 > To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) > <gabryel.mason-williams@xxxxxxxxxxxxx> > Cc: dev@xxxxxxx <dev@xxxxxxx> > Subject: Re: RMDA Bug? > > @Williams, > Sorry for late reply. I'm busy on working getting Ceph/RDMA > performance data these days. > I'm using Intel RDMA NIC with small cluster, there's no serious > issue > happened. > For Mellanox NIC, there's no problem with your ceph.conf from my > perspective. > Below is the steps that I used to deploy cluster > 1. server0: 172.16.1.4, /dev/nvme0n1, /dev/nvme1n1 > 2. server1: 172.16.1.2, /dev/nvme0n1, /dev/nvme1n1 > > Below is my deploy steps: > [admin@server0 deploy]$ ceph-deploy new server0 --fsid > 24280750-d4f7-4d4f-89e4-f95b8fab87ff > [admin@server0 deploy]$ #change ceph.conf as below: > [admin@server0 deploy]$ cat ceph.conf > [global] > cluster = ceph > fsid = 24280750-d4f7-4d4f-89e4-f95b8fab87ff > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > > osd pool default size = 2 > osd pool default min size = 2 > osd pool default pg num = 64 > osd pool default pgp num = 128 > > osd pool default crush rule = 0 > osd crush chooseleaf type = 1 > > mon_allow_pool_delete=true > osd_pool_default_pg_autoscale_mode=on > > ms_type = async+rdma > ;----changcheng: change device to your dev name---------- > ms_async_rdma_device_name = irdma1 > ;----changcheng: ignore below parameters with Mellanox > NIC-------- > ;ms_async_rdma_support_srq = false > > mon_initial_members = server0 > mon_host = 172.16.1.4 > > [mon.rdmarhel0] > host = server0 > mon addr = 172.16.1.4 > [admin@server0 deploy]$ ceph-deploy mon create-initial > [admin@server0 deploy]$ ceph-deploy admin server0 server1 > [admin@server0 deploy]$ ceph-deploy mgr create server0 > [admin@server0 deploy]$ ceph-deploy osd create --data /dev/nvme0n1 > server0 > [admin@server0 deploy]$ ceph-deploy osd create --data /dev/nvme1n1 > server0 > [admin@server0 deploy]$ ceph-deploy osd create --data /dev/nvme0n1 > server1 > [admin@server0 deploy]$ ceph-deploy osd create --data /dev/nvme1n1 > server1 > B.R. > Changcheng > On 08:27 Thu 31 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) wrote: > > 1. When not defining a public and cluster network the OSD and MGR > > nodes do not get recognised > > > > sudo ceph -s > > > > cluster: > > > > id: 820f1573-bc4a-4ee0-b702-80ba5ac13c25 > > > > health: HEALTH_WARN > > > > 3 osds down > > > > 3 hosts (3 osds) down > > > > 1 root (3 osds) down > > > > no active mgr > > > > too few PGs per OSD (21 < min 30) > > > > > > services: > > > > mon: 3 daemons, quorum > > cs04r-sc-com99-05,cs04r-sc-com99-07,cs04r-sc-com99-08 (age 5m) > > > > mgr: no daemons active (since 4m) > > > > osd: 3 osds: 0 up (since 9m), 3 in (since 9m) > > > > > > data: > > > > pools: 1 pools, 64 pgs > > > > objects: 0 objects, 0 B > > > > usage: 3.0 GiB used, 114 GiB / 117 GiB avail > > > > pgs: 44 stale+active+clean > > > > 20 active+clean > > > > This is an issue within the ms_type being async+rdma as the > daemons are > > running: > > > > sudo systemctl status ceph-osd.target > > > > $B!|(B ceph-osd.target - ceph target allowing to start/stop all > > ceph-osd@.service instances at once > > > > Loaded: loaded (/usr/lib/systemd/system/ceph-osd.target; > enabled; > > vendor preset: enabled) > > > > Active: active since Thu 2019-10-31 08:13:42 GMT; 8min ago > > > > sudo systemctl status ceph-mgr.target > > > > $B!|(B ceph-mgr.target - ceph target allowing to start/stop all > > ceph-mgr@.service instances at once > > Loaded: loaded (/usr/lib/systemd/system/ceph-mgr.target; > enabled; > > vendor preset: enabled) > > Active: active since Thu 2019-10-31 08:13:33 GMT; 11min ago > > > > With the config being > > > > [global] > > > > fsid = 820f1573-bc4a-4ee0-b702-80ba5ac13c25 > > > > mon_initial_members = node1, node2, node3 > > > > mon_host = xxx.xx.xxx.aa,xxx.xx.xxx.ac, xxx.xx.xxx.ad > > > > auth_cluster_required = cephx > > > > auth_service_required = cephx > > > > auth_client_required = cephx > > > > ms_type = async+rdma > > > > ms_async_rdma_device_name = mlx4_0 > > > __________________________________________________________________ > > > > From: Liu, Changcheng <changcheng.liu@xxxxxxxxx> > > Sent: 31 October 2019 01:09 > > To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) > > <gabryel.mason-williams@xxxxxxxxxxxxx> > > Cc: dev@xxxxxxx <dev@xxxxxxx> > > Subject: Re: RMDA Bug? > > > > > 2) I'll confirm with my colleague that whether cluster network > is > > really used in 14.2.4. We also hit similar problem these days even > > using TCP async messenger. > > [Changcheng]: > > 1) The problem should be already sovled in 14.2.4. We hit the > problem > > in 14.2.1 > > 2) I'll try to verify your problem when I have time(I'm working on > > other > > affairs). There should be no problem when unifying both > public/cluster > > network with RDMA device. > > On 23:22 Wed 30 Oct, Liu, Changcheng wrote: > > > I'm working on master branch and deploy two nodes cluster. Data > is > > transferring over RDMA. > > > [admin@server0 ~]$ sudo ceph daemon osd.0 perf dump > > AsyncMessenger::RDMAWorker-1 > > > { > > > "AsyncMessenger::RDMAWorker-1": { > > > "tx_no_mem": 0, > > > "tx_parital_mem": 0, > > > "tx_failed_post": 0, > > > "tx_chunks": 26966, > > > "tx_bytes": 52789637, > > > "rx_chunks": 26916, > > > "rx_bytes": 52812278, > > > "pending_sent_conns": 0 > > > } > > > } > > > > > > The only difference is that I don$B!G(Bt differentiate > public/cluster > > network in my cluster. > > > You can try to make all public/cluster network use RDMA. > > > Note: > > > 1) If both public/cluster use RDMA, we can$B!G(Bt > differentiate them in > > different subnetwork. This is feature limited. I'm planning to > solve it > > in future) > > > 2) I'll confirm with my colleague that whether cluster network > is > > really used in 14.2.4. We also hit similar problem these days even > > using TCP async messenger. > > > > > > Below is my cluster's ceph configuration. > > > I also attach the systemd patch used in my side. > > > [admin@server0 ~]$ cat /etc/ceph/ceph.conf > > > [global] > > > cluster = ceph > > > fsid = 24280750-d4f7-4d4f-89e4-f95b8fab87ff > > > auth_cluster_required = cephx > > > auth_service_required = cephx > > > auth_client_required = cephx > > > > > > osd pool default size = 2 > > > osd pool default min size = 2 > > > osd pool default pg num = 64 > > > osd pool default pgp num = 128 > > > > > > osd pool default crush rule = 0 > > > osd crush chooseleaf type = 1 > > > > > > mon_allow_pool_delete=true > > > osd_pool_default_pg_autoscale_mode=off > > > > > > ms_type = async+rdma > > > ms_async_rdma_device_name = mlx5_0 > > > > > > mon_initial_members = server0 > > > mon_host = 172.16.1.4 > > > > > > [mon.rdmarhel0] > > > host = server0 > > > mon addr = 172.16.1.4 > > > [admin@server0 ~]$ > > > > > > B.R. > > > Changcheng > > > > > > On 13:07 Wed 30 Oct, Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) > wrote: > > > > 1. The current problem is that it still sending data over > the > > ethernet > > > > instead of ib. > > > > 2. [global] > > > > fsid=xxxx > > > > mon_initial_members = node1, node2, node3 > > > > mon_host = xxx.xx.xxx.ab,xxx.xx.xxx.ac, xxx.xx.xxx.ad > > > > auth_cluster_required = cephx > > > > auth_service_required = cephx > > > > auth_client_required = cephx > > > > public_network = xxx.xx.xxx.0/24 > > > > cluster_network = xx.xxx.0.0/16 > > > > ms_cluster_type = async+rdma > > > > ms_type = async+rdma > > > > ms_public_type = async+posix > > > > [mgr] > > > > ms_type = async+posix > > > > 3. The ceph cluster is deployed using ceph-deploy then > once up > > all of > > > > the daemons are turned off the rdma cluster config is > then > > sent > > > > around then once that is complete the daemons are > turned > > back on. > > > > The ulimit is set to unlimited, LimitMEMLOCK=infinity > is set > > on the > > > > ceph-disk@.service, ceph-mds@.service, > ceph-mon@.service, > > > > ceph-osd@.service, ceph-radosgw@.service, aswell as > > > > PrivateDevices=no on ceph-mds@.service, > ceph-mon@.service > > and > > > > ceph-radosgw@.service. The ethernet mtu is set to 1000 > > > > > > __________________________________________________________________ > > > > > > > > From: Liu, Changcheng <changcheng.liu@xxxxxxxxx> > > > > Sent: 30 October 2019 12:24 > > > > To: Mason-Williams, Gabryel (DLSLtd,RAL,LSCI) > > > > <gabryel.mason-williams@xxxxxxxxxxxxx> > > > > Cc: dev@xxxxxxx <dev@xxxxxxx> > > > > Subject: Re: RMDA Bug? > > > > > > > > 1. What's the problem do you hit when using RDMA in 14.2.4? > Any > > log > > > > shows the error? > > > > 2. What's your ceph.conf? > > > > 3. How do you deploy the ceph cluster? RDMA need lock some > > memory. So, > > > > it needs change some system configuration to meet with this > > > > requirement? > > > > On 11:21 Wed 30 Oct, Gabryel Mason-Williams wrote: > > > > > Liu, Changcheng wrote: > > > > > > On 07:31 Mon 28 Oct, Mason-Williams, Gabryel > > (DLSLtd,RAL,LSCI) > > > > wrote: > > > > > > > I am using ceph version 12.2.8 > > > > > > > (ae699615bac534ea496ee965ac6192cb7e0e07c0) > luminous > > (stable). > > > > > > > > > > > > > > I have not checked the master branch do you think > this > > is an > > > > issue in > > > > > > > luminous that has been removed in later versions? > > I > > > > haven't hit problem > > > > > > on master branch. Ceph/RDMA changed a lot > > > > > > from luminous to master branch. > > > > > > > > > > > > Is below configuration really needed in > > luminous/ceph.conf? > > > > > > > ms_async_rdma_local_gid = xxxx On master > > branch, > > > > this > > > > > > parameter is not needed at all. > > > > > > B.R. > > > > > > Changcheng > > > > > > > > > > > > > __________________________________________________________________ > > > > > > > > > > Thanks, the issue of the OSD's falling over seems to have > gone > > away > > > > updating to Nautilus 14.2.4. However, I am still unable to > get > > it to > > > > properly communicate over RDMA even with removing > > > > ms_async_rdma_local_gid. > > > > > _______________________________________________ > > > > > Dev mailing list -- dev@xxxxxxx > > > > > To unsubscribe send an email to dev-leave@xxxxxxx > > > > > > > > > > > > -- > > > > > > > > This e-mail and any attachments may contain confidential, > > copyright and > > > > or privileged material, and are for the use of the intended > > addressee > > > > only. If you are not the intended addressee or an > authorised > > recipient > > > > of the addressee please notify us of receipt by returning > the > > e-mail > > > > and do not use, copy, retain, distribute or disclose the > > information in > > > > or attached to the e-mail. > > > > Any opinions expressed within this e-mail are those of the > > individual > > > > and not necessarily of Diamond Light Source Ltd. > > > > Diamond Light Source Ltd. cannot guarantee that this e-mail > or > > any > > > > attachments are free from viruses and we cannot accept > liability > > for > > > > any damage which you may sustain as a result of software > viruses > > which > > > > may be transmitted in or with the message. > > > > Diamond Light Source Limited (company no. 4375679). > Registered > > in > > > > England and Wales with its registered office at Diamond > House, > > Harwell > > > > Science and Innovation Campus, Didcot, Oxfordshire, OX11 > 0DE, > > United > > > > Kingdom > > > > > > > _______________________________________________ > > > > Dev mailing list -- dev@xxxxxxx > > > > To unsubscribe send an email to dev-leave@xxxxxxx > > > > > > From 40fa0d7096364b410e8242c46967029fb949876a Mon Sep 17 > 00:00:00 > > 2001 > > > From: Changcheng Liu <changcheng.liu@xxxxxxxxxx> > > > Date: Tue, 23 Jul 2019 18:50:57 +0800 > > > Subject: [PATCH] rdma systemd: grant access to /dev and unlimit > mem > > > > > > Signed-off-by: Changcheng Liu <changcheng.liu@xxxxxxxxxx> > > > > > > diff --git a/systemd/ceph-fuse@xxxxxxxxxxx > > b/systemd/ceph-fuse@xxxxxxxxxxx > > > index d603042b12..ff2e9072f6 100644 > > > --- a/systemd/ceph-fuse@xxxxxxxxxxx > > > +++ b/systemd/ceph-fuse@xxxxxxxxxxx > > > @@ -12,6 +12,7 @@ ExecStart=/usr/bin/ceph-fuse -f --cluster > > ${CLUSTER} %I > > > LockPersonality=true > > > MemoryDenyWriteExecute=true > > > NoNewPrivileges=true > > > +LimitMEMLOCK=infinity > > > # ceph-fuse requires access to /dev fuse device > > > PrivateDevices=no > > > ProtectControlGroups=true > > > diff --git a/systemd/ceph-mds@xxxxxxxxxxx > > b/systemd/ceph-mds@xxxxxxxxxxx > > > index 39a2e63105..0e58dfeeea 100644 > > > --- a/systemd/ceph-mds@xxxxxxxxxxx > > > +++ b/systemd/ceph-mds@xxxxxxxxxxx > > > @@ -14,7 +14,8 @@ ExecReload=/bin/kill -HUP $MAINPID > > > LockPersonality=true > > > MemoryDenyWriteExecute=true > > > NoNewPrivileges=true > > > -PrivateDevices=yes > > > +LimitMEMLOCK=infinity > > > +PrivateDevices=no > > > ProtectControlGroups=true > > > ProtectHome=true > > > ProtectKernelModules=true > > > diff --git a/systemd/ceph-mgr@xxxxxxxxxxx > > b/systemd/ceph-mgr@xxxxxxxxxxx > > > index c98f6378b9..682c7ecef3 100644 > > > --- a/systemd/ceph-mgr@xxxxxxxxxxx > > > +++ b/systemd/ceph-mgr@xxxxxxxxxxx > > > @@ -18,7 +18,8 @@ LockPersonality=true > > > MemoryDenyWriteExecute=false > > > > > > NoNewPrivileges=true > > > -PrivateDevices=yes > > > +LimitMEMLOCK=infinity > > > +PrivateDevices=no > > > ProtectControlGroups=true > > > ProtectHome=true > > > ProtectKernelModules=true > > > diff --git a/systemd/ceph-mon@xxxxxxxxxxx > > b/systemd/ceph-mon@xxxxxxxxxxx > > > index c95fcabb26..51854fad96 100644 > > > --- a/systemd/ceph-mon@xxxxxxxxxxx > > > +++ b/systemd/ceph-mon@xxxxxxxxxxx > > > @@ -21,7 +21,8 @@ LockPersonality=true > > > MemoryDenyWriteExecute=true > > > # Need NewPrivileges via `sudo smartctl` > > > NoNewPrivileges=false > > > -PrivateDevices=yes > > > +LimitMEMLOCK=infinity > > > +PrivateDevices=no > > > ProtectControlGroups=true > > > ProtectHome=true > > > ProtectKernelModules=true > > > diff --git a/systemd/ceph-osd@xxxxxxxxxxx > > b/systemd/ceph-osd@xxxxxxxxxxx > > > index 1b5c9c82b8..06c20d7c83 100644 > > > --- a/systemd/ceph-osd@xxxxxxxxxxx > > > +++ b/systemd/ceph-osd@xxxxxxxxxxx > > > @@ -16,6 +16,8 @@ LockPersonality=true > > > MemoryDenyWriteExecute=true > > > # Need NewPrivileges via `sudo smartctl` > > > NoNewPrivileges=false > > > +LimitMEMLOCK=infinity > > > +PrivateDevices=no > > > ProtectControlGroups=true > > > ProtectHome=true > > > ProtectKernelModules=true > > > diff --git a/systemd/ceph-radosgw@xxxxxxxxxxx > > b/systemd/ceph-radosgw@xxxxxxxxxxx > > > index 7e3ddf6c04..fe1a6b9159 100644 > > > --- a/systemd/ceph-radosgw@xxxxxxxxxxx > > > +++ b/systemd/ceph-radosgw@xxxxxxxxxxx > > > @@ -13,7 +13,8 @@ ExecStart=/usr/bin/radosgw -f --cluster > ${CLUSTER} > > --name client.%i --setuser ce > > > LockPersonality=true > > > MemoryDenyWriteExecute=true > > > NoNewPrivileges=true > > > -PrivateDevices=yes > > > +LimitMEMLOCK=infinity > > > +PrivateDevices=no > > > ProtectControlGroups=true > > > ProtectHome=true > > > ProtectKernelModules=true > > > diff --git a/systemd/ceph-volume@.service > > b/systemd/ceph-volume@.service > > > index c21002cecb..e2d1f67b85 100644 > > > --- a/systemd/ceph-volume@.service > > > +++ b/systemd/ceph-volume@.service > > > @@ -9,6 +9,7 @@ KillMode=none > > > Environment=CEPH_VOLUME_TIMEOUT=10000 > > > ExecStart=/bin/sh -c 'timeout $CEPH_VOLUME_TIMEOUT > > /usr/sbin/ceph-volume-systemd %i' > > > TimeoutSec=0 > > > +LimitMEMLOCK=infinity > > > > > > [Install] > > > WantedBy=multi-user.target > > > -- > > > 2.17.1 > > > > > > _______________________________________________ > > > Dev mailing list -- dev@xxxxxxx > > > To unsubscribe send an email to dev-leave@xxxxxxx > > > > > > -- > > > > This e-mail and any attachments may contain confidential, > copyright and > > or privileged material, and are for the use of the intended > addressee > > only. If you are not the intended addressee or an authorised > recipient > > of the addressee please notify us of receipt by returning the > e-mail > > and do not use, copy, retain, distribute or disclose the > information in > > or attached to the e-mail. > > Any opinions expressed within this e-mail are those of the > individual > > and not necessarily of Diamond Light Source Ltd. > > Diamond Light Source Ltd. cannot guarantee that this e-mail or any > > attachments are free from viruses and we cannot accept liability > for > > any damage which you may sustain as a result of software viruses > which > > may be transmitted in or with the message. > > Diamond Light Source Limited (company no. 4375679). Registered in > > England and Wales with its registered office at Diamond House, > Harwell > > Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, > United > > Kingdom _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx