Re: [SOLVED] output discards (queue drops) on switchport

Blair Bethwaite <blair.bethwaite@xxxxxxxxx> · Tue, 12 Sep 2017 01:15:58 +1000

Flow-control may well just mask the real problem. Did your throughput improve? Also, does that mean flow-control is on for all ports on the switch...? IIUC, then such "global pause" flow-control will mean switchports with links to upstream network devices will also be paused if the switch is attempting to pass packets from those ports down to a congested host.
Are your ring buffers tuned up as high as possible, `ethtool -g <ifname>`?

On 11 September 2017 at 23:09, Andreas Herrmann <andreas@xxxxxxxx> wrote:
Hi,

flow control was active on the NIC but not on the switch.

Enabling flowcontrol for both direction solved the problem:

        flowcontrol receive on

        flowcontrol send on

Port        Send FlowControl  Receive FlowControl  RxPause       TxPause

            admin    oper     admin    oper

----------  -------- -------- -------- --------    ------------- -------------

Et17/1      on       on       on       on          0             64500

Et17/2      on       on       on       on          0             33746

Et17/3      on       on       on       on          0             17126

Et18/1      on       on       on       on          0             36948

Et18/2      on       on       on       on          0             39628

Regards,

Andreas

On 08.09.2017 13:57, Andreas Herrmann wrote:

> Hello,

>

> I have a fresh Proxmox installation on 5 servers (Supermciro X10SRW-F, Xeon

> E5-1660 v4, 128 GB RAM) with each 8 Samsung SSD SM863 960GB connected to a

> LSI-9300-8i (SAS3008) controller used as OSDs for Ceph (12.1.2)

>

> The servers are connected to two Arista DCS-7060CX-32S switches. I'm using

> MLAG bond (bondmode LACP, xmit_hash_policy layer3+4, MTU 9000):

>  * backend network for Ceph: cluster network & public network

>    Mellanox ConnectX-4 Lx dual-port 25 GBit/s

>  * frontend network: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ dual-port

>

> Ceph is quite a default installation with size=3.

>

> My problem:

> I'm issuing a dd (dd if=/dev/urandom of=urandom.0 bs=10M count=1024) in a test

> virtual machine (the only one running in the cluster) with arround 210 MB/s. I

> get output drops on all switchports. The drop rate is between 0.1 - 0.9 %. The

> drop rate of 0.9 % is reached when writing with about 1300MB/s into ceph.

>

> First I thought about a problem with the Mellanox cards and used the Intel

> cards for ceph traffic. The problem also exists.

>

> I tried quite a lot and nothing help:

>  * changed the MTU from 9000 to 1500

>  * changed bond_xmit_hash_policy from layer3+4 to layer2+3

>  * deactivated the bond and just used a single link

>  * disabled offloading

>  * disabled power management in BIOS

>  * perf-bias 0

>

> I analyzed the traffic via tcpdump and got some of those "errors":

>  * TCP Previous segment not captured

>  * TCP Out-of-Order

>  * TCP Retransmission

>  * TCP Fast Retransmission

>  * TCP Dup ACK

>  * TCP ACKed unseen segment

>  * TCP Window Update

>

> Is that behaviour normal for ceph or has anyone ideas how to solve that

> problem with the output drops at switch-side

>

> With iperf I can reach full 50 GBit/s on the bond with zero output drops.

>

> Andreas

_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Cheers,
~Blairo

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com