Re: Ceph on RDMA

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 30, 2017 at 7:53 AM, Jeroen Oldenhof <jeroen@xxxxxx> wrote:
> Hi,
>
> I used  'https://community.mellanox.com/docs/DOC-2721', and was under the
> impression I followed all steps.. but I somehow skipped over the
> /usr/lib/systemd/system/ (/lib/systemd/system/ in case of Ubuntu)
> ceph-xxx@.service files.
> I did update the /etc/security/limits.conf file for unlimited memlock, but
> as I am using systemd for start/stop control, that ofcourse had no effect.
>
> After the alterations in de systemd service files, RDMA works!! The
> 'RDMAStack RDMAStack!!! WARNING !!!' notice was legit afterall :)
>
> I do encounter some oddites I would like to share.
> After upgrading (using apt update/upgrade) the kernel to 4.4.0-93-generic,
> ceph-OSD barfes out again, see below log snippet.
>
>     -9> 2017-08-29 09:40:21.538709 7f7254dde700  2 -- 10.0.0.2:6801/2828 >>
> - conn(0xc56a78e800 :6801 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._stop
>     -8> 2017-08-29 09:40:21.538696 7f72565e1700  2 -- 10.0.0.2:6805/2828 >>
> 10.0.0.3:0/27835 conn(0xc56a630800 :6805 s=STATE_ACCEPTING_WAIT_SEQ pgs=3
> cs=1 l=1).handle_connect_msg accept write reply msg done
>     -7> 2017-08-29 09:40:21.538772 7f72565e1700  2 -- 10.0.0.2:6805/2828 >>
> 10.0.0.3:0/27835 conn(0xc56a630800 :6805 s=STATE_ACCEPTING_WAIT_SEQ pgs=3
> cs=1 l=1)._process_connection accept get newly_acked_seq 0
>     -6> 2017-08-29 09:40:21.539284 7f7254dde700  1 -- 10.0.0.2:6801/2828 >>
> - conn(0xc56a5ab000 :6801 s=STATE_ACCEPTING pgs=0 cs=0
> l=0)._process_connection sd=50 -
>     -5> 2017-08-29 09:40:21.539319 7f7254dde700  5 Infiniband recv_msg
> recevd: 6, 924, 2116118, 0, fe80000000000000001e0bffff4cff55
>     -4> 2017-08-29 09:40:21.539354 7f7254dde700 -1 RDMAConnectedSocketImpl
> activate failed to transition to RTR state: (22) Invalid argument
>     -3> 2017-08-29 09:40:21.541306 7f72555df700  1 -- 10.0.0.2:6800/2828 >>
> - conn(0xc56a5a8000 :6800 s=STATE_ACCEPTING pgs=0 cs=0
> l=0)._process_connection sd=52 -
>     -2> 2017-08-29 09:40:21.541355 7f72555df700  5 Infiniband recv_msg
> recevd: 6, 927, 11225430, 0, fe80000000000000001e0bffff4cff55
>     -1> 2017-08-29 09:40:21.541379 7f72555df700 -1 RDMAConnectedSocketImpl
> activate failed to transition to RTR state: (22) Invalid argument
>      0> 2017-08-29 09:40:21.544688 7f72555df700 -1
> /build/ceph-12.1.4/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In
> function 'void RDMAConnectedSocketImpl::handle_connection()' thread
> 7f72555df700 time 2017-08-29 09:40:21.541394
> /build/ceph-12.1.4/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 244:
> FAILED assert(!r)
>
>
> Reverting back to 4.4.0-92-generic solves it though.. so I stay with that
> for now.
> I will have a go with linux 4.6.0.

interesting, I have no idea on this. maybe others help?

>
> Thanks again!
>
> Best regards,
> Jeroen Oldenhof
>
>
> Op 28-8-2017 om 16:12 schreef Haomai Wang:
>
>> do you follow this
>> instruction(https://community.mellanox.com/docs/DOC-2693)?
>>
>> On Mon, Aug 28, 2017 at 6:40 AM, Jeroen Oldenhof <jeroen@xxxxxx> wrote:
>>>
>>> Hi All!
>>>
>>> I'm trying to run CEPH over RDMA, using a batch of Infiniband Mellanox
>>> MT25408 20GBit (4x DDR) cards.
>>>
>>> RDMA is running, rping works between all hosts, and I've configured
>>> 10.0.0.x
>>> addressing on the ib0 interfaces.
>>>
>>> But when enabling RMDA in ceph.conf:
>>>
>>>    ms_type = async+rdma
>>>    ms_async_rdma_device_name = mlx4_0
>>>
>>> OSD and MON on all hosts barf:
>>>      -5> 2017-08-28 12:03:37.623110 7f2326c9de00  1  Processor -- start
>>>      -4> 2017-08-28 12:03:37.624232 7f23205fe700  1 Infiniband
>>> binding_port
>>> found active port 1
>>>      -3> 2017-08-28 12:03:37.624250 7f23205fe700  1 Infiniband init
>>> assigning: 1024 receive buffers
>>>      -2> 2017-08-28 12:03:37.624260 7f23205fe700  1 Infiniband init
>>> assigning: 1024 send buffers
>>>      -1> 2017-08-28 12:03:37.624262 7f23205fe700  1 Infiniband init
>>> device
>>> allow 4194303 completion entries
>>>       0> 2017-08-28 12:03:37.628379 7f23205fe700 -1
>>> /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: In function 'int
>>> Infiniband::MemoryManager::Cluster::fill(uint32_t)' thread 7f23205fe700
>>> time
>>> 2017-08-28 12:03:37.624433
>>> /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: 599: FAILED
>>> assert(m)
>>>
>>>
>>> As suggested in the thread
>>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018943.html
>>> I also tried with lower values for receive and send buffer:
>>>      -4> 2017-08-28 12:36:46.997001 7fa28b270700  1 Infiniband
>>> binding_port
>>> found active port 1
>>>      -3> 2017-08-28 12:36:46.997026 7fa28b270700  1 Infiniband init
>>> assigning: 256 receive buffers
>>>      -2> 2017-08-28 12:36:46.997029 7fa28b270700  1 Infiniband init
>>> assigning: 256 send buffers
>>>      -1> 2017-08-28 12:36:46.997030 7fa28b270700  1 Infiniband init
>>> device
>>> allow 4194303 completion entries
>>>       0> 2017-08-28 12:36:47.001835 7fa28b270700 -1
>>> /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: In function 'int
>>> Infiniband::MemoryManager::Cluster::fill(uint32_t)' thread 7fa28b270700
>>> time
>>> 2017-08-28 12:36:46.997231
>>> /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: 599: FAILED
>>> assert(m)
>>>
>>>
>>>
>>>
>>> $ ceph -v
>>> ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous
>>> (rc)
>>>
>>> In the osd logs I also see some of these:
>>>    -236> 2017-08-28 12:17:34.507315 7f7f4815ce00 -1 RDMAStack
>>> RDMAStack!!!
>>> WARNING !!! For RDMA to work properly user memlock (ulimit -l) must be
>>> big
>>> enough to allow large amount of registered memory. We recommend setting
>>> this
>>> parameter to infinity
>>>
>>> but, the memlock has been set to infinity:
>>>
>>> $ ulimit -l
>>> unlimited
>>>
>>> Any suggestions..?
>>>
>>> Best regards,
>>> Jeroen Oldenhof
>>> The Netherlands
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux