Re: RDMA/RoCE enablement failed with (113) No route to host

Michael Green <green@xxxxxxxxxxxxx> · Fri, 21 Dec 2018 22:39:01 -0500

I was informed today that the CEPH environment I’ve been working on is no longer available. Unfortunately this happened before I could try any of your suggestions, Roman. 
Thank you for all the attention and advice. 

--
Michael Green

On Dec 20, 2018, at 08:21, Roman Penyaev <rpenyaev@xxxxxxx> wrote:

On 2018-12-19 22:01, Marc Roos wrote:
I would be interested learning about the performance increase it has
compared to 10Gbit. I got the ConnectX-3 Pro but I am not using the rdma
because support is not default available.

Not too much, the following is the comparison on latest master using
fio engine, which measures bare ceph messenger performance (no disk IO):
https://github.com/ceph/ceph/pull/24678

Mellanox MT27710 Family [ConnectX-4 Lx] 25gb/s:

  bs        iodepth=8,  async+posix               iodepth=8,  async+rdma
----    ---------------------------------     ----------------------------------
  4k    IOPS=30.0k  BW=121MiB/s   0.257ms     IOPS=47.9k  BW=187MiB/s  0.166ms
  8k    IOPS=30.8k  BW=240MiB/s   0.259ms     IOPS=46.3k  BW=362MiB/s  0.172ms
 16k    IOPS=25.1k  BW=392MiB/s   0.318ms     IOPS=45.2k  BW=706MiB/s  0.176ms
 32k    IOPS=23.1k  BW=722MiB/s   0.345ms     IOPS=37.5k  BW=1173MiB/s 0.212ms
 64k    IOPS=18.0k  BW=1187MiB/s  0.420ms     IOPS=41.0k  BW=2624MiB/s 0.189ms
128k    IOPS=12.1k  BW=1518MiB/s  0.657ms     IOPS=20.9k  BW=2613MiB/s 0.381ms
256k    IOPS=3530   BW=883MiB/s   2.265ms     IOPS=4624   BW=1156MiB/s 1.729ms
512k    IOPS=2084   BW=1042MiB/s  3.387ms     IOPS=2406   BW=1203MiB/s  3.32ms
  1m    IOPS=1119   BW=1119MiB/s  7.145ms     IOPS=1277   BW=1277MiB/s  6.26ms
  2m    IOPS=551    BW=1101MiB/s  14.51ms     IOPS=631    BW=1263MiB/s 12.66ms
  4m    IOPS=272    BW=1085MiB/s  29.45ms     IOPS=318    BW=1268MiB/s 25.17ms

  bs        iodepth=128,  async+posix               iodepth=128,  async+rdma
----    ---------------------------------     ----------------------------------
  4k    IOPS=75.9k  BW=297MiB/s  1.683ms      IOPS=83.4k  BW=326MiB/s   1.535ms
  8k    IOPS=64.3k  BW=502MiB/s  1.989ms      IOPS=70.3k  BW=549MiB/s   1.819ms
 16k    IOPS=53.9k  BW=841MiB/s  2.376ms      IOPS=57.8k  BW=903MiB/s   2.214ms
 32k    IOPS=42.2k  BW=1318MiB/s 3.034ms      IOPS=59.4k  BW=1855MiB/s  2.154ms
 64k    IOPS=30.0k  BW=1934MiB/s 4.135ms      IOPS=42.3k  BW=2645MiB/s  3.023ms
128k    IOPS=18.1k  BW=2268MiB/s 7.052ms      IOPS=21.2k  BW=2651MiB/s  6.031ms
256k    IOPS=5186   BW=1294MiB/s 24.71ms      IOPS=5253   BW=1312MiB/s  24.39ms
512k    IOPS=2897   BW=1444MiB/s 44.19ms      IOPS=2944   BW=1469MiB/s  43.48ms
  1m    IOPS=1306   BW=1297MiB/s 97.98ms      IOPS=1421   BW=1415MiB/s  90.27ms
  2m    IOPS=612    BW=1199MiB/s 208.6ms      IOPS=862    BW=1705MiB/s  148.9ms
  4m    IOPS=316    BW=1235MiB/s 409.1ms      IOPS=416    BW=1664MiB/s  307.4ms

1. As you can see there is no big difference between posix and rdma.

2. Even 25gb/s card is used we barely reach 20gb/s.  I have also results
   on 100gb/s qlogic cards, no difference, because the bottleneck is not
   a network.  This is especially visible on loads with bigger number of
   iopdeth: bandwidth is not significantly changed. So even you increase
   number of requests in-flight you reach the limit how fast those
   requests are processed.

3. Keep in mind this is only messenger performance, so on real ceph loads you
   will get less, because of the whole IO stack involved.

--
Roman
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com