Re: Typical 10GbE latency

Gary M <garym@xxxxxxxxxx> · Fri, 7 Nov 2014 18:42:40 -0700

Wido,
Take the switch out of the path between nodes and remeasure.. ICMP-echo requests are very low priority traffic for switches and network stacks. 

If you really want to know, place a network analyzer between the nodes to measure the request packet to response packet latency.. The ICMP traffic to the "ping application" is not accurate in the sub-millisecond range. And should only be used as a rough estimate.

You also may want to install the high resolution timer patch, sometimes called HRT, to the kernel which may give you different results. 

ICMP traffic takes a different path than the TCP traffic and should not be considered an indicator of defect. 

I believe the ping app calls the sendto system call.(sorry its been a while since I last looked)  Systems calls can take between .1us and .2us each. However, the ping application makes several of these calls and waits for a signal from the kernel. The wait for a signal means the ping application must wait to be rescheduled to report the time.Rescheduling will depend on a lot of other factors in the os. eg, timers, card interrupts other tasks with higher priorities.  Reporting the time must add a few more systems calls for this to happen. As the ping application loops to post the next ping request which again requires a few systems calls which may cause a task switch while in each system call.

For the above factors, the ping application is not a good representation of network performance due to factors in the application and network traffic shaping performed at the switch and the tcp stacks. 

cheers,
gary

On Fri, Nov 7, 2014 at 4:32 PM, Łukasz Jagiełło <jagiello.lukasz@xxxxxxxxx> wrote:
Hi,

rtt min/avg/max/mdev = 0.070/0.177/0.272/0.049 ms

04:00.0 Ethernet controller: Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection (rev 01)

at both hosts and Arista 7050S-64 between.

Both hosts were part of active ceph cluster.

On Thu, Nov 6, 2014 at 5:18 AM, Wido den Hollander <wido@xxxxxxxx> wrote:
Hello,

While working at a customer I've ran into a 10GbE latency which seems

high to me.

I have access to a couple of Ceph cluster and I ran a simple ping test:

$ ping -s 8192 -c 100 -n <ip>

Two results I got:

rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms

rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms

Both these environment are running with Intel 82599ES 10Gbit cards in

LACP. One with Extreme Networks switches, the other with Arista.

Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm

seeing:

rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms

As you can see, the Cisco Nexus network has high latency compared to the

other setup.

You would say the switches are to blame, but we also tried with a direct

TwinAx connection, but that didn't help.

This setup also uses the Intel 82599ES cards, so the cards don't seem to

be the problem.

The MTU is set to 9000 on all these networks and cards.

I was wondering, others with a Ceph cluster running on 10GbE, could you

perform a simple network latency test like this? I'd like to compare the

results.

--

Wido den Hollander

42on B.V.

Ceph trainer and consultant

Phone: +31 (0)20 700 9902

Skype: contact42on

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Łukasz Jagiełło
lukasz<at>jagiello<dot>org

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com