Re: Async Messenger RDMA IB ib_uverbs_write return EACCES

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Roman,
    This problem doesn't happen on master branch when using Mellanox
	MCX414A-BCAT ConnectX-4 NIC with msg/async/rdma(ms_type=async+rdma)
	This problem is hit on Intel X722 NIC(msg/async/rdma/iwarp).

	When the problem happened, ibv_query_devices has been executed
	successfully for severeral times. The problem only happen when
	trying to run ceph-osd daemon.

	In my side, I use "strace -f" command to trace below command which
	trigger the segmental falut:
	    strace -f ${PATH_CEPH_BUILD_DIR}/bin/ceph-osd -i 2 -c ${PATH_CEPH_BUILD_DIR}/ceph.conf
    It doesn't show that setuid is called.

    Could you tell me how do you use ftrace to track the userspace call
	stack to find that setuid is called before opening rdma devices?

	I also tried by biset the patches since ceph support X722/iWARP
	historically and find that below PR affects the result:
	    #mon: centralized config
	    https://github.com/ceph/ceph/pull/20172

B.R.
Changcheng

On 11:46 Tue 16 Apr, Roman Penyaev wrote:
> On 2019-04-16 10:58, Liu, Changcheng wrote:
> > Hi Roman,
> >    After only setting ms_cluster_type to "async+rdma", the ceph cluster
> >    could be setup with below command:
> >       OSD=3 MON=1 MDS=0 RGW=0 MGR=0 ../src/vstart.sh -n -d -X --msgr1
> > 
> >    Have you also hit the problem when setting ms_public_type to
> > "async+rdma"?
> 
> The problem happens exactly because you setup ms_public_type to rdma,
> (RDMA connection to monitor is established *before* setuid() call)
> ms_type compounds public and cluster, so setting ms_type to rdma you
> setup cluster and public both to the rdma.  Setting only cluster network
> to rdma avoids the problem you described, because sockets descriptor
> do not have that restriction of changing credentials, libe ueverbs
> device has, so connection to monitor succeeds.
> 
> So in current master you are not able to use rdma for public network.
> I can confirm, that this is broken.
> 
> But since all connections should be closed in get_monmap_and_config(),
> for me is not clear why uverbs device is still opened and reused
> afterwards.  That should be debugged.
> 
> --
> Roman
> 



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux