Re: RDMA/RoCE enablement failed with (113) No route to host

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 
I would be interested learning about the performance increase it has 
compared to 10Gbit. I got the ConnectX-3 Pro but I am not using the rdma 
because support is not default available.



sockperf ping-pong -i 192.168.2.13 -p 5001 -m 16384 -t 10 --pps=max

sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=81205; 
ReceivedMessages=81204
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=80411; 
ReceivedMessages=80411
sockperf: ====> avg-lat= 61.638 (std-dev=7.525)
sockperf: # dropped messages = 0; # duplicated messages = 0; # 
out-of-order messages = 0
sockperf: Summary: Latency is 61.638 usec
sockperf: Total 80411 observations; each percentile contains 804.11 
observations
sockperf: ---> <MAX> observation = 1207.678
sockperf: ---> percentile  99.99 =  119.027
sockperf: ---> percentile  99.90 =   82.075
sockperf: ---> percentile  99.50 =   76.133
sockperf: ---> percentile  99.00 =   75.013
sockperf: ---> percentile  95.00 =   70.831
sockperf: ---> percentile  90.00 =   68.471
sockperf: ---> percentile  75.00 =   65.594
sockperf: ---> percentile  50.00 =   61.626
sockperf: ---> percentile  25.00 =   59.406
sockperf: ---> <MIN> observation =   40.527




[@c01 sbin]# sockperf ping-pong -i 192.168.10.112 -p 5001 -t 10
sockperf: == version #2.6 ==
sockperf[CLIENT] send on:sockperf: using recvfrom() to block on 
socket(s)

[ 0] IP = 192.168.10.112  PORT =  5001 # UDP
sockperf: Warmup stage (sending a few dummy messages)...
sockperf: Starting test...
sockperf: Test end (interrupted by timer)
sockperf: Test ended
sockperf: [Total Run] RunTime=10.100 sec; SentMessages=431009; 
ReceivedMessages=431008
sockperf: ========= Printing statistics for Server No: 0
sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=426779; 
ReceivedMessages=426779
sockperf: ====> avg-lat= 11.660 (std-dev=1.102)
sockperf: # dropped messages = 0; # duplicated messages = 0; # 
out-of-order messages = 0
sockperf: Summary: Latency is 11.660 usec
sockperf: Total 426779 observations; each percentile contains 4267.79 
observations
sockperf: ---> <MAX> observation =  272.374
sockperf: ---> percentile  99.99 =   37.709
sockperf: ---> percentile  99.90 =   20.410
sockperf: ---> percentile  99.50 =   17.167
sockperf: ---> percentile  99.00 =   15.751
sockperf: ---> percentile  95.00 =   12.853
sockperf: ---> percentile  90.00 =   12.317
sockperf: ---> percentile  75.00 =   11.884
sockperf: ---> percentile  50.00 =   11.452
sockperf: ---> percentile  25.00 =   11.188
sockperf: ---> <MIN> observation =    8.995


-----Original Message-----
From: Michael Green [mailto:green@xxxxxxxxxxxxx] 
Sent: 19 December 2018 21:00
To: Roman Penyaev; Mohamad Gebai
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  RDMA/RoCE enablement failed with (113) No 
route to host

Thanks for the insights Mohammad and Roman. Interesting read.

My interest in RDMA is purely from testing perspective. 

Still I would be interested if somebody who has RDMA enabled and 
running, to share their ceph.conf. 

My RDMA related entries are taken from Mellanox blog here 
https://community.mellanox.com/s/article/bring-up-ceph-rdma---developer-s-guide. 
They used Luminous and built it from source. I'm running binary 
distribution of Mimic here.

ms_type = async+rdma
ms_cluster = async+rdma
ms_async_rdma_device_name = mlx5_0
ms_async_rdma_polling_us = 0
ms_async_rdma_local_gid=<node's_gid>


Or, if somebody with knowledge of the code could tell me when is this 
"RDMAConnectedSocketImpl" error is printed might also be helpful.

2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981 
crush map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981 
crush map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981 
crush map has features 1009089991638532096, adjusting msgr requires
2018-12-19 21:45:32.757 7f52b8548140  0 mon.rio@-1(probing).osd e25981 
crush map has features 288514051259236352, adjusting msgr requires
2018-12-19 21:45:33.138 7f52b8548140  0 mon.rio@-1(probing) e5  my rank 
is now 0 (was -1)
2018-12-19 21:45:33.141 7f529f3fe700 -1  RDMAConnectedSocketImpl 
activate failed to transition to RTR state: (113) No route to host
2018-12-19 21:45:33.142 7f529f3fe700 -1 
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARC
H/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/1
3.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/RDMAConnectedSocketIm
pl.cc: In function 'void RDMAConnectedSocketImpl::handle_connection()' 
thread 7f529f3fe700 time 2018-12-19 21:45:33.141972
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARC
H/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/1
3.2.2/rpm/el7/BUILD/ceph-13.2.2/src/msg/async/rdma/RDMAConnectedSocketIm
pl.cc: 224: FAILED assert(!r)
--
Michael Green





	On Dec 19, 2018, at 5:21 AM, Roman Penyaev <rpenyaev@xxxxxxx> 
wrote:


	Well, I am playing with ceph rdma implementation quite a while
	and it has unsolved problems, thus I would say the status is
	"not completely broken", but "you can run it on your own risk
	and smile":
	
	1. On disconnect of previously active (high write load) connection
	  there is a race that can lead to osd (or any receiver) crash:
	
	  https://github.com/ceph/ceph/pull/25447
	
	2. Recent qlogic hardware (qedr drivers) does not support
	  IBV_EVENT_QP_LAST_WQE_REACHED, which is used in ceph rdma
	  implementation, pull request from 1. also targets this
	  incompatibility.
	
	3. On high write load and many connections there is a chance,
	  that osd can run out of receive WRs and rdma connection (QP)
	  on sender side will get IBV_WC_RETRY_EXC_ERR, thus disconnected.
	  This is fundamental design problem, which has to be fixed on
	  protocol level (e.g. propagate backpressure to senders).
	
	4. Unfortunately neither rdma or any other 0-latency network can
	  bring significant value, because the bottle neck is not a
	  network, please consider this for further reading regarding
	  transport performance in ceph:
	
	  https://www.spinics.net/lists/ceph-devel/msg43555.html
	
	  Problems described above have quite a big impact on overall
	  transport performance.
	
	--
	Roman
	




_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux