Re: clogging qdisc

Grzegorz Gwóźdź <grzegorz@xxxxxxxxxxx> · Thu, 27 Dec 2018 22:29:17 +0100

On 27.12.2018 19:23, Grant Taylor wrote:
On 12/27/18 10:15 AM, Grzegorz Gwóźdź wrote:
This solution worked for few years in several networks but in one 
network since few weeks, in peak hours that mechanism clogs.

Okay.  It sounds to me like the methodology works well enough. But it 
might have scaling problems.

Previous system it was 2 x 6-core xeon (24 threads)
Now it's threadripper 2990 (64 threads)

No core is loaded over 40% (max 25% software interrupts per core only 
because NIC has 8 queues per interface so it can be operated by 8 cores 
per interface)
Few ksoftirqd on about 0.7% cpu

How can I find choke point if no parameter indicates it?

Pings to all local hosts grows to hundreds ms (even to hosts without 
any traffic) and throughtput drops.

Ouch.

The only solution is:
tc qdisc del root dev eth0

That doesn't seem like a solution.  Maybe a workaround, if you're lucky.

If I immediately add rules again problem immediately starts too.

That sounds like the workaround doesn't even work.

But after some time even though traffic is bigger I load queues and 
everything works until next attack

I'm thinking that "attack" might be the proper word.

I'm wondering if this is a number of packets per second vs the size of 
packets per second issue.

Specifically if the "attack" is considerably more smaller packets than 
normal.  I'm guessing normal traffic is fewer but bigger packets.

Take a packet capture during normal traffic periods and a separate 
packet capture during attack traffic periods.

Then open each of the captures in Wireshark and pull up the Packet 
Lengths report from the Statistics menu.  I'm guessing that you will 
see a significant difference between the two captures.

I put separate machine on mirrored interface just for logging traffic - 
statistically there is no difference both from LAN and from Internet.
After clogging, number of small packets rises (mostly SYN packets) and 
overall number of packets drops but just before clogging I cannot see 
anything strange. But it's over 1Gbps and 100k pps
I'm almost sure the cause is in the incoming packets but I'ts hard to find

10s statistics under "attack", roughly the same as normal
==================================================================================================================================
Packet Lengths:
Topic / Item       Count         Average       Min val       Max 
val       Rate (ms)     Percent       Burst rate    Burst start
----------------------------------------------------------------------------------------------------------------------------------
Packet Lengths     3416240       866,96        60 1518          
341,6367      100%          443,1200 3,735
 0-19              0             -             - -             
0,0000        0,00%         - -
 20-39             0             -             - -             
0,0000        0,00%         - -
 40-79             987398        65,82         60 79            
98,7435       28,90%        129,1300 2,865
 80-159            323407        103,61        80 159           
32,3419       9,47%         48,5500 1,905
 160-319           98202         218,05        160 319           
9,8206        2,87%         12,1800 3,360
 320-639           72602         467,25        320 639           
7,2605        2,13%         16,1800 4,680
 640-1279          57417         969,00        640 1279          
5,7419        1,68%         7,6000 1,510
 1280-2559         1877214       1466,15       1280 1518          
187,7284      54,95%        257,1100 3,705
 2560-5119         0             -             - -             
0,0000        0,00%         - -
 5120 and greater  0             -             - -             
0,0000        0,00%         - - 
----------------------------------------------------------------------------------------------------------------------------------

Statistics above shows 300k pps because its whole interface IN + OUT for 
both containers

Default queue for unclassified packets is 2Gbps so everything that 
doesn't have defined filter fits there.

Smaller or bigger packets - every packet should fall into designed queue 
if it is normal packet but maybe it's maliciously prepared frame that 
breaks rules and algorithms

I don't think it is hardware issue because this system works in LXC 
container and on the same NIC in other container (doing the same work 
for other clients) everything works fine.

I think containers and VMs are good for some things.  I don't think 
that they are (more specifically their overhead is) good for high 
throughput traffic.  Particularly large numbers of small packets per 
second (PPS). High PPS with small traffic requires quite a bit of 
optimization.  I also think it's actually rare outside of specific 
situations.  What's I've seen more frequently is fewer (by one or more 
orders of magnitude) packets that are larger (by one or more orders of 
magnitude).  Overall the amount of data is roughly the same.  But 
/how/ it's done can cause considerable load on equipment.  Especially 
equipment that is not optimized for high PPS, much less additional 
overhead like containers or VMs.

It's lxc so there is almost no overhead on network traffic
I estimate this machine could handle over 40Gbps

There are 2 lxc containers on this machine. One is handling 1.7Gbps and 
is working fine, second has 1.2Gbps and has problems.
Both work on the same physical interface.
I've moved one container to other machine - no difference - one, that 
was clogging - clogged still

Load on system is low, there is no hardware problem, whole hardware 
has been replaced, on new hardware I've installed new system (Ubuntu 
18.04)
No dropped packets in interface statistics. dmesg is clear.

What messages were you seeing in dmesg before?

I mean there is nothing new in dmesg

As a result conntrack table grows until overflow (if I don't delete 
qdisc)
I even sniffed all traffic and tried to analyze it but it's hard 
since it's over 1Gbps (on 10Gb interface)

The connection tracking table overflowing tells me one of two things. 
That you are truly dealing with a high PPS condition -or- that you 
don't have enough memory in the system and the size of the conntrack 
table is restricted.

I once took a system that comfortable ran with ~512 MB of memory up to 
4 GB to allow the conntrack table to be large enough for what the 
system was doing.  (I think the conntrack table was a fixed percentage 
of memory in that kernel.  Maybe it's a tune able now.)

conntrack table overflows because clients try to establish connections 
but can not wait for reply (clogged TC blocks them). It fills up for 
about 10 minutes and then overflows

What can I check?

Check the Packet Lengths graph as suggested above.

If your problem is indeed high PPS, you might also be having problem 
outside of your Linux machine.  It's quite possible that there is 
enough traffic / PPS that things like switches and / or wireless 
access points are also being negatively effected.  It's possible that 
they are your choke point and not the actual Linux system.

But deleting qdisc causes traffic growth since clients have no limits. 
Choke point is certainly in my Linux box.

And if it was NIC issue it would go worse after deleting qdisc (more 
traffic)

Where to look for a cause?

I think you need to get a good understanding of what your two typical 
traffic patterns are, normal, and attack.  Including if this is 
legitimate traffic or if it is someone conducting an attack and the 
network is buckling under the stress.

You might also consider changing out network cards.  I've been around 
people that like to poo poo some Realtek cards and other non-Intel / 
non-Broadcom NICs.  Admittedly, some of the better NICs have more CPU 
/ memory / I/O on them to handle more traffic.

I use 10Gb Mellanox NIC. I can try Intel's but Mellanox always performed 
better for me.

Are there any "hacks" in TC allowing to look in the guts?

It looks like it's changing state to "clogged" but

tc -s class ls dev eth0

looks completely normal (only grows number of sfq queues created 
dynamically for every connection since more and more connections are 
created but not closed)

If I don't delete qdisc it recovers from that state after about 15 
minutes (but earlier traffic is crushed to 1/3, conntrack overflows, etc)

At the same time system itself responds pretty well, there is no sign of 
anything going wrong, similar system in other container works ok...

GG

Re: clogging qdisc

Linux Advanced Routing and Traffic Control