I'm using TC with HFSC for limiting bandwidth of clients.
There is about 2500 clients in local network in ip class 192.168.0.0/16
For performance i'm using hash filters.
This solution worked for few years in several networks but in one
network since few weeks, in peak hours that mechanism clogs.
Pings to all local hosts grows to hundreds ms (even to hosts without any
traffic) and throughtput drops.
The only solution is:
tc qdisc del root dev eth0
If I immediately add rules again problem immediately starts too.
But after some time even though traffic is bigger I load queues and
everything works until next attack
I don't think it is hardware issue because this system works in LXC
container and on the same NIC in other container (doing the same work
for other clients) everything works fine.
Load on system is low, there is no hardware problem, whole hardware has
been replaced, on new hardware I've installed new system (Ubuntu 18.04)
No dropped packets in interface statistics. dmesg is clear.
As a result conntrack table grows until overflow (if I don't delete qdisc)
I even sniffed all traffic and tried to analyze it but it's hard since
it's over 1Gbps (on 10Gb interface)
What can I check?
Where to look for a cause?
GG