On Wed, Aug 30, 2017 at 7:53 AM, Jeroen Oldenhof <jeroen@xxxxxx> wrote: > Hi, > > I used 'https://community.mellanox.com/docs/DOC-2721', and was under the > impression I followed all steps.. but I somehow skipped over the > /usr/lib/systemd/system/ (/lib/systemd/system/ in case of Ubuntu) > ceph-xxx@.service files. > I did update the /etc/security/limits.conf file for unlimited memlock, but > as I am using systemd for start/stop control, that ofcourse had no effect. > > After the alterations in de systemd service files, RDMA works!! The > 'RDMAStack RDMAStack!!! WARNING !!!' notice was legit afterall :) > > I do encounter some oddites I would like to share. > After upgrading (using apt update/upgrade) the kernel to 4.4.0-93-generic, > ceph-OSD barfes out again, see below log snippet. > > -9> 2017-08-29 09:40:21.538709 7f7254dde700 2 -- 10.0.0.2:6801/2828 >> > - conn(0xc56a78e800 :6801 s=STATE_ACCEPTING pgs=0 cs=0 l=0)._stop > -8> 2017-08-29 09:40:21.538696 7f72565e1700 2 -- 10.0.0.2:6805/2828 >> > 10.0.0.3:0/27835 conn(0xc56a630800 :6805 s=STATE_ACCEPTING_WAIT_SEQ pgs=3 > cs=1 l=1).handle_connect_msg accept write reply msg done > -7> 2017-08-29 09:40:21.538772 7f72565e1700 2 -- 10.0.0.2:6805/2828 >> > 10.0.0.3:0/27835 conn(0xc56a630800 :6805 s=STATE_ACCEPTING_WAIT_SEQ pgs=3 > cs=1 l=1)._process_connection accept get newly_acked_seq 0 > -6> 2017-08-29 09:40:21.539284 7f7254dde700 1 -- 10.0.0.2:6801/2828 >> > - conn(0xc56a5ab000 :6801 s=STATE_ACCEPTING pgs=0 cs=0 > l=0)._process_connection sd=50 - > -5> 2017-08-29 09:40:21.539319 7f7254dde700 5 Infiniband recv_msg > recevd: 6, 924, 2116118, 0, fe80000000000000001e0bffff4cff55 > -4> 2017-08-29 09:40:21.539354 7f7254dde700 -1 RDMAConnectedSocketImpl > activate failed to transition to RTR state: (22) Invalid argument > -3> 2017-08-29 09:40:21.541306 7f72555df700 1 -- 10.0.0.2:6800/2828 >> > - conn(0xc56a5a8000 :6800 s=STATE_ACCEPTING pgs=0 cs=0 > l=0)._process_connection sd=52 - > -2> 2017-08-29 09:40:21.541355 7f72555df700 5 Infiniband recv_msg > recevd: 6, 927, 11225430, 0, fe80000000000000001e0bffff4cff55 > -1> 2017-08-29 09:40:21.541379 7f72555df700 -1 RDMAConnectedSocketImpl > activate failed to transition to RTR state: (22) Invalid argument > 0> 2017-08-29 09:40:21.544688 7f72555df700 -1 > /build/ceph-12.1.4/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: In > function 'void RDMAConnectedSocketImpl::handle_connection()' thread > 7f72555df700 time 2017-08-29 09:40:21.541394 > /build/ceph-12.1.4/src/msg/async/rdma/RDMAConnectedSocketImpl.cc: 244: > FAILED assert(!r) > > > Reverting back to 4.4.0-92-generic solves it though.. so I stay with that > for now. > I will have a go with linux 4.6.0. interesting, I have no idea on this. maybe others help? > > Thanks again! > > Best regards, > Jeroen Oldenhof > > > Op 28-8-2017 om 16:12 schreef Haomai Wang: > >> do you follow this >> instruction(https://community.mellanox.com/docs/DOC-2693)? >> >> On Mon, Aug 28, 2017 at 6:40 AM, Jeroen Oldenhof <jeroen@xxxxxx> wrote: >>> >>> Hi All! >>> >>> I'm trying to run CEPH over RDMA, using a batch of Infiniband Mellanox >>> MT25408 20GBit (4x DDR) cards. >>> >>> RDMA is running, rping works between all hosts, and I've configured >>> 10.0.0.x >>> addressing on the ib0 interfaces. >>> >>> But when enabling RMDA in ceph.conf: >>> >>> ms_type = async+rdma >>> ms_async_rdma_device_name = mlx4_0 >>> >>> OSD and MON on all hosts barf: >>> -5> 2017-08-28 12:03:37.623110 7f2326c9de00 1 Processor -- start >>> -4> 2017-08-28 12:03:37.624232 7f23205fe700 1 Infiniband >>> binding_port >>> found active port 1 >>> -3> 2017-08-28 12:03:37.624250 7f23205fe700 1 Infiniband init >>> assigning: 1024 receive buffers >>> -2> 2017-08-28 12:03:37.624260 7f23205fe700 1 Infiniband init >>> assigning: 1024 send buffers >>> -1> 2017-08-28 12:03:37.624262 7f23205fe700 1 Infiniband init >>> device >>> allow 4194303 completion entries >>> 0> 2017-08-28 12:03:37.628379 7f23205fe700 -1 >>> /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: In function 'int >>> Infiniband::MemoryManager::Cluster::fill(uint32_t)' thread 7f23205fe700 >>> time >>> 2017-08-28 12:03:37.624433 >>> /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: 599: FAILED >>> assert(m) >>> >>> >>> As suggested in the thread >>> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-June/018943.html >>> I also tried with lower values for receive and send buffer: >>> -4> 2017-08-28 12:36:46.997001 7fa28b270700 1 Infiniband >>> binding_port >>> found active port 1 >>> -3> 2017-08-28 12:36:46.997026 7fa28b270700 1 Infiniband init >>> assigning: 256 receive buffers >>> -2> 2017-08-28 12:36:46.997029 7fa28b270700 1 Infiniband init >>> assigning: 256 send buffers >>> -1> 2017-08-28 12:36:46.997030 7fa28b270700 1 Infiniband init >>> device >>> allow 4194303 completion entries >>> 0> 2017-08-28 12:36:47.001835 7fa28b270700 -1 >>> /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: In function 'int >>> Infiniband::MemoryManager::Cluster::fill(uint32_t)' thread 7fa28b270700 >>> time >>> 2017-08-28 12:36:46.997231 >>> /build/ceph-12.1.4/src/msg/async/rdma/Infiniband.cc: 599: FAILED >>> assert(m) >>> >>> >>> >>> >>> $ ceph -v >>> ceph version 12.1.4 (a5f84b37668fc8e03165aaf5cbb380c78e4deba4) luminous >>> (rc) >>> >>> In the osd logs I also see some of these: >>> -236> 2017-08-28 12:17:34.507315 7f7f4815ce00 -1 RDMAStack >>> RDMAStack!!! >>> WARNING !!! For RDMA to work properly user memlock (ulimit -l) must be >>> big >>> enough to allow large amount of registered memory. We recommend setting >>> this >>> parameter to infinity >>> >>> but, the memlock has been set to infinity: >>> >>> $ ulimit -l >>> unlimited >>> >>> Any suggestions..? >>> >>> Best regards, >>> Jeroen Oldenhof >>> The Netherlands >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@xxxxxxxxxxxxxx >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com