On 27.12.2018 19:23, Grant Taylor wrote:
On 12/27/18 10:15 AM, Grzegorz Gwóźdź wrote:
This solution worked for few years in several networks but in one
network since few weeks, in peak hours that mechanism clogs.
Okay. It sounds to me like the methodology works well enough. But it
might have scaling problems.
Previous system it was 2 x 6-core xeon (24 threads)
Now it's threadripper 2990 (64 threads)
No core is loaded over 40% (max 25% software interrupts per core only
because NIC has 8 queues per interface so it can be operated by 8 cores
per interface)
Few ksoftirqd on about 0.7% cpu
How can I find choke point if no parameter indicates it?
Pings to all local hosts grows to hundreds ms (even to hosts without
any traffic) and throughtput drops.
Ouch.
The only solution is:
tc qdisc del root dev eth0
That doesn't seem like a solution. Maybe a workaround, if you're lucky.
If I immediately add rules again problem immediately starts too.
That sounds like the workaround doesn't even work.
But after some time even though traffic is bigger I load queues and
everything works until next attack
I'm thinking that "attack" might be the proper word.
I'm wondering if this is a number of packets per second vs the size of
packets per second issue.
Specifically if the "attack" is considerably more smaller packets than
normal. I'm guessing normal traffic is fewer but bigger packets.
Take a packet capture during normal traffic periods and a separate
packet capture during attack traffic periods.
Then open each of the captures in Wireshark and pull up the Packet
Lengths report from the Statistics menu. I'm guessing that you will
see a significant difference between the two captures.
I put separate machine on mirrored interface just for logging traffic -
statistically there is no difference both from LAN and from Internet.
After clogging, number of small packets rises (mostly SYN packets) and
overall number of packets drops but just before clogging I cannot see
anything strange. But it's over 1Gbps and 100k pps
I'm almost sure the cause is in the incoming packets but I'ts hard to find
10s statistics under "attack", roughly the same as normal
==================================================================================================================================
Packet Lengths:
Topic / Item Count Average Min val Max
val Rate (ms) Percent Burst rate Burst start
----------------------------------------------------------------------------------------------------------------------------------
Packet Lengths 3416240 866,96 60 1518
341,6367 100% 443,1200 3,735
0-19 0 - - -
0,0000 0,00% - -
20-39 0 - - -
0,0000 0,00% - -
40-79 987398 65,82 60 79
98,7435 28,90% 129,1300 2,865
80-159 323407 103,61 80 159
32,3419 9,47% 48,5500 1,905
160-319 98202 218,05 160 319
9,8206 2,87% 12,1800 3,360
320-639 72602 467,25 320 639
7,2605 2,13% 16,1800 4,680
640-1279 57417 969,00 640 1279
5,7419 1,68% 7,6000 1,510
1280-2559 1877214 1466,15 1280 1518
187,7284 54,95% 257,1100 3,705
2560-5119 0 - - -
0,0000 0,00% - -
5120 and greater 0 - - -
0,0000 0,00% - -
----------------------------------------------------------------------------------------------------------------------------------
Statistics above shows 300k pps because its whole interface IN + OUT for
both containers
Default queue for unclassified packets is 2Gbps so everything that
doesn't have defined filter fits there.
Smaller or bigger packets - every packet should fall into designed queue
if it is normal packet but maybe it's maliciously prepared frame that
breaks rules and algorithms
I don't think it is hardware issue because this system works in LXC
container and on the same NIC in other container (doing the same work
for other clients) everything works fine.
I think containers and VMs are good for some things. I don't think
that they are (more specifically their overhead is) good for high
throughput traffic. Particularly large numbers of small packets per
second (PPS). High PPS with small traffic requires quite a bit of
optimization. I also think it's actually rare outside of specific
situations. What's I've seen more frequently is fewer (by one or more
orders of magnitude) packets that are larger (by one or more orders of
magnitude). Overall the amount of data is roughly the same. But
/how/ it's done can cause considerable load on equipment. Especially
equipment that is not optimized for high PPS, much less additional
overhead like containers or VMs.
It's lxc so there is almost no overhead on network traffic
I estimate this machine could handle over 40Gbps
There are 2 lxc containers on this machine. One is handling 1.7Gbps and
is working fine, second has 1.2Gbps and has problems.
Both work on the same physical interface.
I've moved one container to other machine - no difference - one, that
was clogging - clogged still
Load on system is low, there is no hardware problem, whole hardware
has been replaced, on new hardware I've installed new system (Ubuntu
18.04)
No dropped packets in interface statistics. dmesg is clear.
What messages were you seeing in dmesg before?
I mean there is nothing new in dmesg
As a result conntrack table grows until overflow (if I don't delete
qdisc)
I even sniffed all traffic and tried to analyze it but it's hard
since it's over 1Gbps (on 10Gb interface)
The connection tracking table overflowing tells me one of two things.
That you are truly dealing with a high PPS condition -or- that you
don't have enough memory in the system and the size of the conntrack
table is restricted.
I once took a system that comfortable ran with ~512 MB of memory up to
4 GB to allow the conntrack table to be large enough for what the
system was doing. (I think the conntrack table was a fixed percentage
of memory in that kernel. Maybe it's a tune able now.)
conntrack table overflows because clients try to establish connections
but can not wait for reply (clogged TC blocks them). It fills up for
about 10 minutes and then overflows
What can I check?
Check the Packet Lengths graph as suggested above.
If your problem is indeed high PPS, you might also be having problem
outside of your Linux machine. It's quite possible that there is
enough traffic / PPS that things like switches and / or wireless
access points are also being negatively effected. It's possible that
they are your choke point and not the actual Linux system.
But deleting qdisc causes traffic growth since clients have no limits.
Choke point is certainly in my Linux box.
And if it was NIC issue it would go worse after deleting qdisc (more
traffic)
Where to look for a cause?
I think you need to get a good understanding of what your two typical
traffic patterns are, normal, and attack. Including if this is
legitimate traffic or if it is someone conducting an attack and the
network is buckling under the stress.
You might also consider changing out network cards. I've been around
people that like to poo poo some Realtek cards and other non-Intel /
non-Broadcom NICs. Admittedly, some of the better NICs have more CPU
/ memory / I/O on them to handle more traffic.
I use 10Gb Mellanox NIC. I can try Intel's but Mellanox always performed
better for me.
Are there any "hacks" in TC allowing to look in the guts?
It looks like it's changing state to "clogged" but
tc -s class ls dev eth0
looks completely normal (only grows number of sfq queues created
dynamically for every connection since more and more connections are
created but not closed)
If I don't delete qdisc it recovers from that state after about 15
minutes (but earlier traffic is crushed to 1/3, conntrack overflows, etc)
At the same time system itself responds pretty well, there is no sign of
anything going wrong, similar system in other container works ok...
GG