How did you set your network between mgr, mon, mds, and osd? ------------------ Original ------------------ From: "xl_3992@xxxxxx" <xl_3992@xxxxxx>; Date: Tue, Nov 30, 2021 10:07 AM To: "GHui"<ugiwgh@xxxxxx>; Cc: "ceph-users"<ceph-users@xxxxxxx>; Subject: Re: Re: ceph rdma network connect refused I do not try the higher version, just test in the 14.2.22 version [store@xxxxxxxxxxxxxxxxxxxx ~]$ show_gids DEV PORT INDEX GID IPv4 VER DEV --- ---- ----- --- ------------ --- --- mlx5_0 1 0 fe80:0000:0000:0000:0e42:a1ff:fead:58b2 v1 enp94s0f0 mlx5_0 1 1 fe80:0000:0000:0000:0e42:a1ff:fead:58b2 v2 enp94s0f0 mlx5_1 1 0 fe80:0000:0000:0000:0e42:a1ff:fead:58b3 v1 enp94s0f1 mlx5_1 1 1 fe80:0000:0000:0000:0e42:a1ff:fead:58b3 v2 enp94s0f1 mlx5_bond_0 1 0 fe80:0000:0000:0000:0e42:a1ff:fead:4be6 v1 bond0 mlx5_bond_0 1 1 fe80:0000:0000:0000:0e42:a1ff:fead:4be6 v2 bond0 mlx5_bond_0 1 2 0000:0000:0000:0000:0000:ffff:0a5e:303c 192.168.10 v1 bond0 mlx5_bond_0 1 3 0000:0000:0000:0000:0000:ffff:0a5e:303c 192.168.10 v2 bond0 xl_3992@xxxxxx From: GHui Date: 2021-11-30 09:50 To: xl_3992@xxxxxx CC: ceph-users Subject: Re: ceph rdma network connect refused Which Ceph version do you use? Or where container images did you download? ------------------ Original ------------------ From: "xl_3992@xxxxxx" <xl_3992@xxxxxx>; Date: Mon, Nov 29, 2021 11:27 AM To: "ceph-users"<ceph-users@xxxxxxx>; Subject: ceph rdma network connect refused I test rdma network with ceph, when nodes exceed 16, most of osds down; when nodes less 16 nodes , cluster health is ok; who can help me? error log output : 2021-11-29 10:53:06.884 7f0839fec700 -1 --2- 10.94.48.70:0/559149 >> [v2:10.94.48.66:7045/3543288,v1:10.94.48.66:7047/3543288] conn(0x5585a4b3ec00 0x5585bd816700 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=1 rev1=0 rx=0 tx=0)._handle_peer_banner peer [v2:10.94.48.66:7045/3543288,v1:10.94.48.66:7047/3543288] is using msgr V1 protocol 2021-11-29 10:53:07.264 7f083a7ed700 -1 Infiniband send_msg send returned error 111: (111) Connection refused 2021-11-29 10:53:07.264 7f083a7ed700 -1 Infiniband send_msg send returned error 111: (111) Connection refused 2021-11-29 10:53:07.264 7f083a7ed700 -1 Infiniband send_msg send returned error 111: (111) Connection refused 2021-11-29 10:53:07.264 7f083a7ed700 -1 Infiniband send_msg send returned error 111: (111) Connection refused 2021-11-29 10:53:07.264 7f083a7ed700 -1 Infiniband send_msg send returned error 111: (111) Connection refused follow “Bring Up Ceph RDMA - Developer's Guide”, my cluster conf: #----------------------- RDMA --------------------- ms_type = async+rdma ms_cluster_type = async+rdma ms_public_type = async+rdma ms_async_rdma_device_name = mlx5_bond_0 ms_async_rdma_polling_us = 0 ms_async_rdma_local_gid = 0000:0000:0000:0000:0000:ffff:0a5e:3046 [osd] osd_memory_target = 4294967296 nodes env: [store@xxxxxxxxxxxxxxxxxxxx ~]$ ulimit unlimited [store@xxxxxxxxxxxxxxxxxxxx ~]$ ibdev2netdev mlx5_0 port 1 ==> enp94s0f0 (Down) mlx5_1 port 1 ==> enp94s0f1 (Down) mlx5_bond_0 port 1 ==> bond0 (Up) [store@xxxxxxxxxxxxxxxxxxxx ~]$ sudo cat /usr/lib/systemd/system/ceph-osd@.service [Unit] Description=Ceph object storage daemon osd.%i PartOf=ceph-osd.target After=network-online.target local-fs.target time-sync.target Before=remote-fs-pre.target ceph-osd.target Wants=network-online.target local-fs.target time-sync.target remote-fs-pre.target ceph-osd.target [Service] LimitNOFILE=1048576 LimitNPROC=1048576 LimitMEMLOCK=infinity EnvironmentFile=-/etc/sysconfig/ceph Environment=CLUSTER=ceph ExecStart=/usr/bin/ceph-osd -f --cluster ${CLUSTER} --id %i --setuser ceph --setgroup ceph ExecStartPre=/usr/lib/ceph/ceph-osd-prestart.sh --cluster ${CLUSTER} --id %i ExecReload=/bin/kill -HUP $MAINPID LockPersonality=true MemoryDenyWriteExecute=true [Install] WantedBy=ceph-osd.target [store@xxxxxxxxxxxxxxxxxxxx ~]$ cat /etc/security/limits.conf root soft nofile 10000000 root hard nofile 10000000 _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx