Hi Roman, This problem doesn't happen on master branch when using Mellanox MCX414A-BCAT ConnectX-4 NIC with msg/async/rdma(ms_type=async+rdma) This problem is hit on Intel X722 NIC(msg/async/rdma/iwarp). When the problem happened, ibv_query_devices has been executed successfully for severeral times. The problem only happen when trying to run ceph-osd daemon. In my side, I use "strace -f" command to trace below command which trigger the segmental falut: strace -f ${PATH_CEPH_BUILD_DIR}/bin/ceph-osd -i 2 -c ${PATH_CEPH_BUILD_DIR}/ceph.conf It doesn't show that setuid is called. Could you tell me how do you use ftrace to track the userspace call stack to find that setuid is called before opening rdma devices? I also tried by biset the patches since ceph support X722/iWARP historically and find that below PR affects the result: #mon: centralized config https://github.com/ceph/ceph/pull/20172 B.R. Changcheng On 11:46 Tue 16 Apr, Roman Penyaev wrote: > On 2019-04-16 10:58, Liu, Changcheng wrote: > > Hi Roman, > > After only setting ms_cluster_type to "async+rdma", the ceph cluster > > could be setup with below command: > > OSD=3 MON=1 MDS=0 RGW=0 MGR=0 ../src/vstart.sh -n -d -X --msgr1 > > > > Have you also hit the problem when setting ms_public_type to > > "async+rdma"? > > The problem happens exactly because you setup ms_public_type to rdma, > (RDMA connection to monitor is established *before* setuid() call) > ms_type compounds public and cluster, so setting ms_type to rdma you > setup cluster and public both to the rdma. Setting only cluster network > to rdma avoids the problem you described, because sockets descriptor > do not have that restriction of changing credentials, libe ueverbs > device has, so connection to monitor succeeds. > > So in current master you are not able to use rdma for public network. > I can confirm, that this is broken. > > But since all connections should be closed in get_monmap_and_config(), > for me is not clear why uverbs device is still opened and reused > afterwards. That should be debugged. > > -- > Roman >