Ceph RDMA Memory Leakage

Jin Cai <caijin.laurence@xxxxxxxxx> · Mon, 18 Sep 2017 14:09:47 +0800

Hi, cephers

    We are testing the RDMA ms type of Ceph.

    The OSDs and MONs are always marked down by their peers because
they don't have enough buffer to use in the memory buffer pool to
reply the heartbeat ping message from their peers.
    And the log always shows "no enough buffer in worker" even though
the whole cluster is idle without any I/Os from external.

    Ceph configuration about RDMA is as following:
        ms_async_rdma_roce_ver = 1
        ms_async_rdma_sl = 5
        ms_async_rdma_dscp = 136
        ms_async_rdma_send_buffers = 1024
        ms_async_rdma_receive_buffers = 1024

   Even we adjust the value of ms_async_rdma_send_buffers to 32,768,
the 'no enough buffer in worker' log still exists.

   After a deep analysis, we think it is because when a
RDMAConnectedSocketImpl instance is destructed, its queue pair is
added to the dead_queue_pair vector container. And the items of
dead_queue_pair are deleted in the polling thread.

>From the doc of rdmamojo:
When a QP is destroyed any outstanding Work Requests, in either the
Send or Receive Queues, won't be processed anymore by the RDMA device
and Work Completions won't be generated for them. It is up to the user
to clean all of the associated resources of those Work Requests (i.e.
memory buffers)

We can know the problem here is that when there are still outstanding
work request in the queue pair to be deleted, the memory buffer
occupied by these outstanding work request will never be returned to
memory buffer pool because work completions won't be generated for
them. So the memory leakage happens.

A more elegant way before destroying a queue pair is set the queue
pair into error state and wait for the affiliated event
IBV_EVENT_QP_LAST_WQE_REACHED, finally destroy the queue pair.

Do you have any suggestions or ideas? Thanks in advance.
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html