On Mon, Sep 23, 2024 at 12:06 PM yushengjin <yushengjin@xxxxxxxxxxxxx> wrote: > > > 在 23/9/2024 下午5:29, Eric Dumazet 写道: > > On Mon, Sep 23, 2024 at 11:16 AM yushengjin <yushengjin@xxxxxxxxxxxxx> wrote: > >> When conducting WRK testing, the CPU usage rate of the testing machine was > >> 100%. forwarding through a bridge, if the network load is too high, it may > >> cause abnormal load on the ebt_do_table of the kernel ebtable module, leading > >> to excessive soft interrupts and sometimes even directly causing CPU soft > >> deadlocks. > >> > >> After analysis, it was found that the code of ebtables had not been optimized > >> for a long time, and the read-write locks inside still existed. However, other > >> arp/ip/ip6 tables had already been optimized a lot, and performance bottlenecks > >> in read-write locks had been discovered a long time ago. > >> > >> Ref link: https://lore.kernel.org/lkml/20090428092411.5331c4a1@nehalam/ > >> > >> So I referred to arp/ip/ip6 modification methods to optimize the read-write > >> lock in ebtables.c. > >> > >> test method: > >> 1) Test machine creates bridge : > >> ``` bash > >> brctl addbr br-a > >> brctl addbr br-b > >> brctl addif br-a enp1s0f0 enp1s0f1 > >> brctl addif br-b enp130s0f0 enp130s0f1 > >> ifconfig br-a up > >> ifconfig br-b up > >> ``` > >> 2) Testing with another machine: > >> ``` bash > >> ulimit -n 2048 > >> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://4.4.4.2:80/4k.html & > >> ./wrk -t48 -c2000 -d6000 -R10000 -s request.lua http://5.5.5.2:80/4k.html & > >> ``` > >> > >> Signed-off-by: yushengjin <yushengjin@xxxxxxxxxxxxx> > >> --- > >> include/linux/netfilter_bridge/ebtables.h | 47 +++++++- > >> net/bridge/netfilter/ebtables.c | 132 ++++++++++++++++------ > >> 2 files changed, 145 insertions(+), 34 deletions(-) > >> > >> diff --git a/include/linux/netfilter_bridge/ebtables.h b/include/linux/netfilter_bridge/ebtables.h > >> index fd533552a062..dd52dea20fb8 100644 > >> --- a/include/linux/netfilter_bridge/ebtables.h > >> +++ b/include/linux/netfilter_bridge/ebtables.h > >> @@ -93,7 +93,6 @@ struct ebt_table { > >> char name[EBT_TABLE_MAXNAMELEN]; > >> struct ebt_replace_kernel *table; > >> unsigned int valid_hooks; > >> - rwlock_t lock; > >> /* the data used by the kernel */ > >> struct ebt_table_info *private; > >> struct nf_hook_ops *ops; > >> @@ -124,4 +123,50 @@ static inline bool ebt_invalid_target(int target) > >> > >> int ebt_register_template(const struct ebt_table *t, int(*table_init)(struct net *net)); > >> void ebt_unregister_template(const struct ebt_table *t); > >> + > >> +/** > >> + * ebt_recseq - recursive seqcount for netfilter use > >> + * > >> + * Packet processing changes the seqcount only if no recursion happened > >> + * get_counters() can use read_seqcount_begin()/read_seqcount_retry(), > >> + * because we use the normal seqcount convention : > >> + * Low order bit set to 1 if a writer is active. > >> + */ > >> +DECLARE_PER_CPU(seqcount_t, ebt_recseq); > >> + > >> +/** > >> + * ebt_write_recseq_begin - start of a write section > >> + * > >> + * Begin packet processing : all readers must wait the end > >> + * 1) Must be called with preemption disabled > >> + * 2) softirqs must be disabled too (or we should use this_cpu_add()) > >> + * Returns : > >> + * 1 if no recursion on this cpu > >> + * 0 if recursion detected > >> + */ > >> +static inline unsigned int ebt_write_recseq_begin(void) > >> +{ > >> + unsigned int addend; > >> + > >> + addend = (__this_cpu_read(ebt_recseq.sequence) + 1) & 1; > >> + > >> + __this_cpu_add(ebt_recseq.sequence, addend); > >> + smp_mb(); > >> + > >> + return addend; > >> +} > >> + > >> +/** > >> + * ebt_write_recseq_end - end of a write section > >> + * @addend: return value from previous ebt_write_recseq_begin() > >> + * > >> + * End packet processing : all readers can proceed > >> + * 1) Must be called with preemption disabled > >> + * 2) softirqs must be disabled too (or we should use this_cpu_add()) > >> + */ > >> +static inline void ebt_write_recseq_end(unsigned int addend) > >> +{ > >> + smp_wmb(); > >> + __this_cpu_add(ebt_recseq.sequence, addend); > >> +} > > Why not reusing xt_recseq, xt_write_recseq_begin(), xt_write_recseq_end(), > > instead of copy/pasting them ? > > > > This was added in > > > > commit 7f5c6d4f665bb57a19a34ce1fb16cc708c04f219 netfilter: get rid > > of atomic ops in fast path > They used different seqcounts, I'm worried it might have an impact. > > > > If this is an include mess, just move them in a separate include file. > > Can i copy ebt_write_recseq_begin(), ebt_write_recseq_endend to > include/linux/netfilter/x_tables.h ? > > Or add a parameter in xt_write_recseq_begin() , xt_write_recseq_end() to > clarify whether it is xt_recseq or ebt_recseq. I think you can reuse xt_recseq, xt_write_recseq_begin(), xt_write_recseq_end() directly. We use xt_recseq from three different users already. If you think you need a separate variable, please elaborate. We prefer to reuse code, obviously.