Re: What is the should be the expected latency of 10Gbit network connections

Maged Mokhtar <mmokhtar@xxxxxxxxxxx> · Tue, 23 Jan 2018 09:40:26 +0200

On 2018-01-23 08:27, Blair Bethwaite wrote:

Firstly, the OP's premise in asking, "Or should there be a differnce
 of 10x", is fundamentally incorrect. Greater bandwidth does not mean
 lower latency, though the latter almost always results in the former.
 Unfortunately, changing the speed of light remains a difficult
 engineering challenge :-). However, you can do things like: add
 multiple links, overlap signals on the wire, and tweak error
 correction encodings; all to get more bits on the wire without making
 the wire itself any faster. Take Mellanox 100Gb ethernet, 1 lane is
 25Gb, to get 50Gb they mash 2 lanes together, to get 100Gb they mash 4
 lanes - the latency of single bit transmission is more-or-less
 unchanged. Also note that with UDP/TCP pings or actual Ceph traffic
 we're going via the kernel stack running on the CPU and as such the
 speed & power-management of the CPU can make quite a difference.

 Example 25GE on a dual-port CX-4 card in LACP bond, RHEL7 host.

 $ cat /etc/redhat-release
 Red Hat Enterprise Linux Server release 7.3 (Maipo)
 $ ofed_info | head -1
 MLNX_OFED_LINUX-4.0-1.0.1.0 (OFED-4.0-1.0.1):
 $ grep 'model name' /proc/cpuinfo | uniq
 model name      : Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
 $ ibv_devinfo
 hca_id: mlx5_1
         transport:                      InfiniBand (0)
         fw_ver:                         14.18.1000
         node_guid:                      ...
         sys_image_guid:                 ...
         vendor_id:                      0x02c9
         vendor_part_id:                 4117
         hw_ver:                         0x0
         board_id:                       MT_2420110034
 ...

 $ sudo ping -M do -s 8972 -c 100000 -f ...
 100000 packets transmitted, 100000 received, 0% packet loss, time 4652ms
 rtt min/avg/max/mdev = 0.029/0.031/2.711/0.015 ms, ipg/ewma 0.046/0.031 ms

 $ sudo ping -M do -s 3972 -c 100000 -f ...
 100000 packets transmitted, 100000 received, 0% packet loss, time 3321ms
 rtt min/avg/max/mdev = 0.019/0.022/0.364/0.003 ms, ipg/ewma 0.033/0.022 ms

 $ sudo ping -M do -s 1972 -c 100000 -f ...
 100000 packets transmitted, 100000 received, 0% packet loss, time 2818ms
 rtt min/avg/max/mdev = 0.017/0.018/0.086/0.005 ms, ipg/ewma 0.028/0.021 ms

 $ sudo ping -M do -s 472 -c 100000 -f ...
 100000 packets transmitted, 100000 received, 0% packet loss, time 2498ms
 rtt min/avg/max/mdev = 0.014/0.016/0.305/0.005 ms, ipg/ewma 0.024/0.017 ms

 $ sudo ping -M do -c 100000 -f ...
 100000 packets transmitted, 100000 received, 0% packet loss, time 2363ms
 rtt min/avg/max/mdev = 0.014/0.015/0.322/0.006 ms, ipg/ewma 0.023/0.016 ms

 On 22 January 2018 at 22:37, Nick Fisk <nick@xxxxxxxxxx> wrote:
Anyone with 25G ethernet willing to do the test? Would love to see what the
 latency figures are for that.

 From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
 Maged Mokhtar
 Sent: 22 January 2018 11:28
 To: ceph-users@xxxxxxxxxxxxxx
 Subject: Re:  What is the should be the expected latency of
 10Gbit network connections

 On 2018-01-22 08:39, Wido den Hollander wrote:

 On 01/20/2018 02:02 PM, Marc Roos wrote:

   If I test my connections with sockperf via a 1Gbit switch I get around
 25usec, when I test the 10Gbit connection via the switch I have around
 12usec is that normal? Or should there be a differnce of 10x.

 No, that's normal.

 Tests with 8k ping packets over different links I did:

 1GbE:  0.800ms
 10GbE: 0.200ms
 40GbE: 0.150ms

 Wido

 sockperf ping-pong

 sockperf: Warmup stage (sending a few dummy messages)...
 sockperf: Starting test...
 sockperf: Test end (interrupted by timer)
 sockperf: Test ended
 sockperf: [Total Run] RunTime=10.100 sec; SentMessages=432875;
 ReceivedMessages=432874
 sockperf: ========= Printing statistics for Server No: 0
 sockperf: [Valid Duration] RunTime=10.000 sec; SentMessages=428640;
 ReceivedMessages=428640
 sockperf: ====> avg-lat= 11.609 (std-dev=1.684)
 sockperf: # dropped messages = 0; # duplicated messages = 0; #
 out-of-order messages = 0
 sockperf: Summary: Latency is 11.609 usec
 sockperf: Total 428640 observations; each percentile contains 4286.40
 observations
 sockperf: ---> <MAX> observation =  856.944
 sockperf: ---> percentile  99.99 =   39.789
 sockperf: ---> percentile  99.90 =   20.550
 sockperf: ---> percentile  99.50 =   17.094
 sockperf: ---> percentile  99.00 =   15.578
 sockperf: ---> percentile  95.00 =   12.838
 sockperf: ---> percentile  90.00 =   12.299
 sockperf: ---> percentile  75.00 =   11.844
 sockperf: ---> percentile  50.00 =   11.409
 sockperf: ---> percentile  25.00 =   11.124
 sockperf: ---> <MIN> observation =    8.888

 sockperf: Warmup stage (sending a few dummy messages)...
 sockperf: Starting test...
 sockperf: Test end (interrupted by timer)
 sockperf: Test ended
 sockperf: [Total Run] RunTime=1.100 sec; SentMessages=22065;
 ReceivedMessages=22064
 sockperf: ========= Printing statistics for Server No: 0
 sockperf: [Valid Duration] RunTime=1.000 sec; SentMessages=20056;
 ReceivedMessages=20056
 sockperf: ====> avg-lat= 24.861 (std-dev=1.774)
 sockperf: # dropped messages = 0; # duplicated messages = 0; #
 out-of-order messages = 0
 sockperf: Summary: Latency is 24.861 usec
 sockperf: Total 20056 observations; each percentile contains 200.56
 observations
 sockperf: ---> <MAX> observation =   77.158
 sockperf: ---> percentile  99.99 =   54.285
 sockperf: ---> percentile  99.90 =   37.864
 sockperf: ---> percentile  99.50 =   34.406
 sockperf: ---> percentile  99.00 =   33.337
 sockperf: ---> percentile  95.00 =   27.497
 sockperf: ---> percentile  90.00 =   26.072
 sockperf: ---> percentile  75.00 =   24.618
 sockperf: ---> percentile  50.00 =   24.443
 sockperf: ---> percentile  25.00 =   24.361
 sockperf: ---> <MIN> observation =   16.746
 [root@c01 sbin]# sockperf ping-pong -i 192.168.0.12 -p 5001 -t 10
 sockperf: == version #2.6 ==
 sockperf[CLIENT] send on:sockperf: using recvfrom() to block on
 socket(s)

 _______________________________________________
 ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 _______________________________________________
 ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

 I find the ping command with flood option handy to measure latency, gives
 stats min/max/average/std deviation

 example:

 ping  -c 100000 -f 10.0.1.12

 Maged

 _______________________________________________
 ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

The ip flood test will show hardware link level latency. sockperf will show latency user space tcp socket applications will see due to kernel context switches, interrupts, transmission buffers, tcp ack..etc So: 
sockperf is a better latency measurement to what Ceph clients will see. 
The flood latency gives a better picture of expected iops which is the inverse of latency at the link level.( at the app level with concurrency, iops is not related to latency )
Maybe with SPDK/RDMA, Ceph latency will be close to link latency.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com