Ceph OSDs fail to start with RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

 

I am trying to bring up a Ceph cluster where the private network is communicating via RoCEv2. The storage nodes have 2 dual-port 25Gb Mellanox ConnectX-4 NICs, with each NIC’s ports bonded (2x25Gb mode 4). I have set memory limits to unlimited, can rping to each node, and ms_async_rdma_device_name set to the ibdev (mlx5_bond_1). Everything goes smoothly until I start bringing up OSDs. Nothing appears in stderr, but upon further inspection of the OSD log, I see the following error:

 

RDMAConnectedSocketImpl activate failed to transition to RTR state: (19) No such device

/build/ceph-12.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In function 'void RDMAConnectedSocketImpl::handle_connection()' thread 7f908633c700 time 2018-01-26 10:47:51.607573

/build/ceph-12.2.2/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 221: FAILED assert(!r)

 

ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)

1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x102) [0x564a2ccf7892]

2: (RDMAConnectedSocketImpl::handle_connection()+0xb4a) [0x564a2d007fba]

3: (EventCenter::process_events(int, std::chrono::duration<unsigned long, std::ratio<1l, 1000000000l> >*)+0xa08) [0x564a2cd9a418]

4: (()+0xb4f3a8) [0x564a2cd9e3a8]

5: (()+0xb8c80) [0x7f9088c04c80]

6: (()+0x76ba) [0x7f90892f36ba]

7: (clone()+0x6d) [0x7f908836a41d]

NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

 

Anyone see this before or have any suggestions?

 

Thanks,

Orlando

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux