Ceph on RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi All!

I'm trying to run CEPH over RDMA, using a batch of Infiniband Mellanox MT25408 20GBit (4x DDR) cards.

RDMA is running, rping works between all hosts, and I've configured 10.0.0.x addressing on the ib0 interfaces.

But when enabling RMDA in ceph.conf:

  ms_type = async+rdma
  ms_async_rdma_device_name = mlx4_0

OSD and MON on all hosts barf:
    -5> 2017-08-28 12:03:37.623110 7f2326c9de00  1  Processor -- start
    -4> 2017-08-28 12:03:37.624232 7f23205fe700  1 Infiniband binding_port found active port 1     -3> 2017-08-28 12:03:37.624250 7f23205fe700  1 Infiniband init assigning: 1024 receive buffers     -2> 2017-08-28 12:03:37.624260 7f23205fe700  1 Infiniband init assigning: 1024 send buffers     -1> 2017-08-28 12:03:37.624262 7f23205fe700  1 Infiniband init device allow 4194303 completion entries      0> 2017-08-28 12:03:37.628379 7f23205fe700 -1 /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: In function 'int Infiniband::MemoryManager::Cluster::fill(uint32_t)' thread 7f23205fe700 time 2017-08-28 12:03:37.624433 /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: 599: FAILED assert(m)


As suggested in the thread http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018943.html
I also tried with lower values for receive and send buffer:
    -4> 2017-08-28 12:36:46.997001 7fa28b270700  1 Infiniband binding_port found active port 1     -3> 2017-08-28 12:36:46.997026 7fa28b270700  1 Infiniband init assigning: 256 receive buffers     -2> 2017-08-28 12:36:46.997029 7fa28b270700  1 Infiniband init assigning: 256 send buffers     -1> 2017-08-28 12:36:46.997030 7fa28b270700  1 Infiniband init device allow 4194303 completion entries      0> 2017-08-28 12:36:47.001835 7fa28b270700 -1 /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: In function 'int Infiniband::MemoryManager::Cluster::fill(uint32_t)' thread 7fa28b270700 time 2017-08-28 12:36:46.997231
/build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: 599: FAILED assert(m)




$ ceph -v
ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous (rc)

In the osd logs I also see some of these:
  -236> 2017-08-28 12:17:34.507315 7f7f4815ce00 -1 RDMAStack RDMAStack!!! WARNING !!! For RDMA to work properly user memlock (ulimit -l) must be big enough to allow large amount of registered memory. We recommend setting this parameter to infinity

but, the memlock has been set to infinity:

$ ulimit -l
unlimited

Any suggestions..?

Best regards,
Jeroen Oldenhof
The Netherlands

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux