在 25/9/2024 上午12:40, Stephen Hemminger 写道:
On Tue, 24 Sep 2024 15:46:17 +0200
Eric Dumazet <edumazet@xxxxxxxxxx> wrote:
On Tue, Sep 24, 2024 at 3:33 PM Stephen Hemminger
<stephen@xxxxxxxxxxxxxxxxxx> wrote:
On Tue, 24 Sep 2024 17:09:06 +0800
yushengjin <yushengjin@xxxxxxxxxxxxx> wrote:
When conducting WRK testing, the CPU usage rate of the testing machine was
100%. forwarding through a bridge, if the network load is too high, it may
cause abnormal load on the ebt_do_table of the kernel ebtable module, leading
to excessive soft interrupts and sometimes even directly causing CPU soft
deadlocks.
After analysis, it was found that the code of ebtables had not been optimized
for a long time, and the read-write locks inside still existed. However, other
arp/ip/ip6 tables had already been optimized a lot, and performance bottlenecks
in read-write locks had been discovered a long time ago.
Ref link: https://lore.kernel.org/lkml/20090428092411.5331c4a1@nehalam/
So I referred to arp/ip/ip6 modification methods to optimize the read-write
lock in ebtables.c.
What about doing RCU instead, faster and safer.
Safer ? How so ?
Stephen, we have used this stuff already in other netfilter components
since 2011
No performance issue at all.
I was thinking that lockdep and analysis tools do better job looking at RCU.
Most likely, the number of users of ebtables was small enough that nobody looked
hard at it until now.
Even though there are few users of ebtables, there are still serious issues.
This is the data running on the arm Kunpeng-920 (96 cpus) machine,When I
only run
wrk tests, the softirq of the system will rapidly increase to 25%:
02:50:07 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest
%gnice %idle
02:50:25 PM all 0.00 0.00 0.05 0.00 0.72 23.20
0.00 0.00 0.00 76.03
02:50:26 PM all 0.00 0.00 0.08 0.00 0.72 24.53
0.00 0.00 0.00 74.67
02:50:27 PM all 0.01 0.00 0.13 0.00 0.75 24.89
0.00 0.00 0.00 74.23
If ebatlse queries, updates, and other operations are continuously
executed at this time, softirq
will increase again to 50%:
02:52:23 PM all 0.00 0.00 1.18 0.00 0.54 48.91
0.00 0.00 0.00 49.36
02:52:24 PM all 0.00 0.00 1.19 0.00 0.43 48.23
0.00 0.00 0.00 50.15
02:52:25 PM all 0.00 0.00 1.20 0.00 0.50 48.29
0.00 0.00 0.00 50.01
More seriously, soft lockup may occur:
Message from syslogd@localhost at Sep 25 14:52:22 ...
kernel:watchdog: BUG: soft lockup - CPU#88 stuck for 23s! [ebtables:3896]
So i think soft lockup is even more unbearable than performance.