Re: Router congestion (?) caused by b2 upload-file

Linux Advanced Routing and Traffic Control

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3 Nov 2020, at 22:32, Rich Brown wrote:

On Nov 3, 2020, at 3:15 PM, Thomas Rosenstein <thomas.rosenstein@xxxxxxxxxxxxxxxx> wrote:

Hi all,

I have multiple routers which connect to multiple upstream providers, I have noticed a high latency shift in icmp (and generally all connection) if I run b2 upload-file --threads 40 (and I can reproduce this)

What options do I have to analyze why this happens?

General Info:

Routers are connected between each other with 10G Mellanox Connect-X cards via 10G SPF+ DAC cables via a 10G Switch from fs.com
Latency generally is around 0.18 ms between all routers (4).
Throughput is 9.4 Gbit/s with 0 retransmissions when tested with iperf3. 2 of the 4 routers are connected upstream with a 1G connection (separate port, same network card) All routers have the full internet routing tables, i.e. 80k entries for IPv6 and 830k entries for IPv4
Conntrack is disabled (-j NOTRACK)
Kernel 5.4.60
2x Xeon X5670 @ 2.93 Ghz
96 GB RAM
No Swap

During high latency:

Latency on routers which have the traffic flow increases to 12 - 20 ms, for all interfaces, moving of the stream (via bgp disable session) moves also the high latency
iperf3 performance plumets to 300 - 400 MBits
CPU load (user / system) are around 0.1%
Ram Usage is around 3 - 4 GB
if_packets count is stable (around 8000 pkt/s more)


for b2 upload-file with 10 threads I can achieve 60 MB/s consistently, with 40 threads the performance drops to 8 MB/s

I do not believe that 40 tcp streams should be any problem for a machine of that size.

Thanks for any ideas, help, pointers, things I can verify / check / provide additional!

These are the classic symptoms of bufferbloat. I note two indicators: latency jumps 50-100x when uploading, and the decrease in bandwidth (likely caused because ack's are slow to return.) I'm not saying it *is* bufferbloat, but it could be useful to rule it out.

I see you have kernel 5.4, so fq_codel and cake qdisc's should be available. I don't understand enough about your configuration, but you'll want to enable one of those qdisc's "in front of" the bottleneck link so the qdisc can control the queueing. Those qdiscs keep stats to see how much data is queued, how often flow control is given, etc.

I would be curious to hear how this works out.


I have downgraded one of the routers from 5.4.60 to the stock CentOS 7 kernel 3.10.0-514.26.2 and now the issue is GONE
This must be connected to something introduced afterwards!


Thomas

Rich



[Index of Archives]     [LARTC Home Page]     [Netfilter]     [Netfilter Development]     [Network Development]     [Bugtraq]     [GCC Help]     [Yosemite News]     [Linux Kernel]     [Fedora Users]
  Powered by Linux