Ceph RDMA module: OSD marks peers down wrongly

Jin Cai <caijin.laurence@xxxxxxxxx> · Thu, 17 Aug 2017 17:50:45 +0800



Hi cephers,
     I am testing the rdma module of ceph.
     The test environment is as following:
          Ceph version: 12.1.0
          6 hosts and each host has 12 OSDs.

      Error is injected into the cluster by hand:
          1. kill all OSD daemons in one host
           2. restart all the OSD daemons killed just now.

       The problem is that  OSDs in other hosts cannot get heartbeat
reply from each other and marked down by the monitor wrongly.
       By analyse the log, I found that the OSDs from other hosts
sents heartbeat to their peers, but the heartbeat could not be sent
successfully because there doesn't have enough buffer:

        RDMAConnectedSocketImpl operator() no enough buffers in worker
0x7fd839c18d00

       The memory buffer in RDMADispatcher will be released by the
RDMADispatcher::polling() function.
       But when I killed all OSD daemons in one host and restarted
then,  the ratio of memory buffer release became slow and finally the
number of inflight chunks reached 1023(max value is 1024):

       2017-08-15 20:15:42.383778 7fd82641b700 30 RDMAStack
post_tx_buffer release 1 chunks, inflight 1023
       2017-08-15 20:15:42.384151 7fd82641b700 30 RDMAStack
post_tx_buffer release 1 chunks, inflight 1023
       2017-08-15 20:15:42.538885 7fd82641b700 30 RDMAStack
post_tx_buffer release 1 chunks, inflight 1023


      I think the root cause is related to the memory buffer release
when error is injected.
      Do you have any ideas about this? Expect your response and
thanks in advance.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html