Re: Typical 10GbE latency

Wido den Hollander <wido@xxxxxxxx> · Thu, 06 Nov 2014 15:44:22 +0100

On 11/06/2014 02:58 PM, Luis Periquito wrote:
> What is the COPP?
> 

Nothing special, default settings. 200 ICMP packets/second.

But we also tested with a direct TwinAx cable between two hosts, so no
switch involved. That did not improve the latency.

So this seems to be a kernel/driver issue somewhere, but I can't think
of anything.

The systems I have access to have no special tuning and get much better
latency.

Wido

> On Thu, Nov 6, 2014 at 1:53 PM, Wido den Hollander <wido@xxxxxxxx> wrote:
> 
>> On 11/06/2014 02:38 PM, Luis Periquito wrote:
>>> Hi Wido,
>>>
>>> What is the full topology? Are you using a north-south or east-west? So
>> far
>>> I've seen the east-west are slightly slower. What are the fabric modes
>> you
>>> have configured? How is everything connected? Also you have no
>> information
>>> on the OS - if I remember correctly there was a lot of improvements in
>> the
>>> latest kernels...
>>
>> The Nexus 3000s are connected with 40Gbit to the Nexus 7000. There are
>> two 7000 units and 8 3000s spread out over 4 racks.
>>
>> But the test I did was with two hosts connected to the same Nexus 3000
>> switch using TwinAx cabling of 3m.
>>
>> The tests were performed with Ubuntu 14.04 (3.13) and RHEL 7 (3.10), but
>> that didn't make a difference.
>>
>>>
>>> And what about the bandwith?
>>>
>>
>> Just fine, no problems getting 10Gbit through the NICs.
>>
>>> The values you present don't seem awfully high, and the deviation seems
>> low.
>>>
>>
>> No, they don't seem high, but they are about 40% higher then the values
>> I see on other environments. 40% is a lot.
>>
>> This Ceph cluster is SSD-only, so the lower the latency, the more IOps
>> the system can do.
>>
>> Wido
>>
>>> On Thu, Nov 6, 2014 at 1:18 PM, Wido den Hollander <wido@xxxxxxxx>
>> wrote:
>>>
>>>> Hello,
>>>>
>>>> While working at a customer I've ran into a 10GbE latency which seems
>>>> high to me.
>>>>
>>>> I have access to a couple of Ceph cluster and I ran a simple ping test:
>>>>
>>>> $ ping -s 8192 -c 100 -n <ip>
>>>>
>>>> Two results I got:
>>>>
>>>> rtt min/avg/max/mdev = 0.080/0.131/0.235/0.039 ms
>>>> rtt min/avg/max/mdev = 0.128/0.168/0.226/0.023 ms
>>>>
>>>> Both these environment are running with Intel 82599ES 10Gbit cards in
>>>> LACP. One with Extreme Networks switches, the other with Arista.
>>>>
>>>> Now, on a environment with Cisco Nexus 3000 and Nexus 7000 switches I'm
>>>> seeing:
>>>>
>>>> rtt min/avg/max/mdev = 0.160/0.244/0.298/0.029 ms
>>>>
>>>> As you can see, the Cisco Nexus network has high latency compared to the
>>>> other setup.
>>>>
>>>> You would say the switches are to blame, but we also tried with a direct
>>>> TwinAx connection, but that didn't help.
>>>>
>>>> This setup also uses the Intel 82599ES cards, so the cards don't seem to
>>>> be the problem.
>>>>
>>>> The MTU is set to 9000 on all these networks and cards.
>>>>
>>>> I was wondering, others with a Ceph cluster running on 10GbE, could you
>>>> perform a simple network latency test like this? I'd like to compare the
>>>> results.
>>>>
>>>> --
>>>> Wido den Hollander
>>>> 42on B.V.
>>>> Ceph trainer and consultant
>>>>
>>>> Phone: +31 (0)20 700 9902
>>>> Skype: contact42on
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@xxxxxxxxxxxxxx
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
>>
>>
>> --
>> Wido den Hollander
>> 42on B.V.
>> Ceph trainer and consultant
>>
>> Phone: +31 (0)20 700 9902
>> Skype: contact42on
>>
> 

-- 
Wido den Hollander
42on B.V.
Ceph trainer and consultant

Phone: +31 (0)20 700 9902
Skype: contact42on
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com