On 2019-04-16 12:41, Liu, Changcheng wrote:
Hi Roman, This problem doesn't happen on master branch when using Mellanox MCX414A-BCAT ConnectX-4 NIC with msg/async/rdma(ms_type=async+rdma) This problem is hit on Intel X722 NIC(msg/async/rdma/iwarp). When the problem happened, ibv_query_devices has been executed successfully for severeral times. The problem only happen when trying to run ceph-osd daemon. In my side, I use "strace -f" command to trace below command which trigger the segmental falut: strace -f ${PATH_CEPH_BUILD_DIR}/bin/ceph-osd -i 2 -c ${PATH_CEPH_BUILD_DIR}/ceph.conf It doesn't show that setuid is called. Could you tell me how do you use ftrace to track the userspace call stack to find that setuid is called before opening rdma devices?
That's correct, I used ftrace the same way. Also I used gdb to get the exact backtraces, try it also. In my case I debugged ceph-mgr or ceph-mds, ceph-osd runs successfully even ms_type=async+rdma is set. -- Roman