4.14-stable review patch. If anyone has any objections, please let me know. ------------------ From: Cong Wang <xiyou.wangcong@xxxxxxxxx> commit efbf78973978b0d25af59bc26c8013a942af6e64 upstream. Both Eric and Paolo noticed the rcu_barrier() we use in tcf_block_put_ext() could be a performance bottleneck when we have a lot of tc classes. Paolo provided the following to demonstrate the issue: tc qdisc add dev lo root htb for I in `seq 1 1000`; do tc class add dev lo parent 1: classid 1:$I htb rate 100kbit tc qdisc add dev lo parent 1:$I handle $((I + 1)): htb for J in `seq 1 10`; do tc filter add dev lo parent $((I + 1)): u32 match ip src 1.1.1.$J done done time tc qdisc del dev root real 0m54.764s user 0m0.023s sys 0m0.000s The rcu_barrier() there is to ensure we free the block after all chains are gone, that is, to queue tcf_block_put_final() at the tail of workqueue. We can achieve this ordering requirement by refcnt'ing tcf block instead, that is, the tcf block is freed only when the last chain in this block is gone. This also simplifies the code. Paolo reported after this patch we get: real 0m0.017s user 0m0.000s sys 0m0.017s Tested-by: Paolo Abeni <pabeni@xxxxxxxxxx> Cc: Eric Dumazet <edumazet@xxxxxxxxxx> Cc: Jiri Pirko <jiri@xxxxxxxxxxxx> Cc: Jamal Hadi Salim <jhs@xxxxxxxxxxxx> Signed-off-by: Cong Wang <xiyou.wangcong@xxxxxxxxx> Signed-off-by: David S. Miller <davem@xxxxxxxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- include/net/sch_generic.h | 1 - net/sched/cls_api.c | 29 +++++++++-------------------- 2 files changed, 9 insertions(+), 21 deletions(-) --- a/include/net/sch_generic.h +++ b/include/net/sch_generic.h @@ -273,7 +273,6 @@ struct tcf_chain { struct tcf_block { struct list_head chain_list; - struct work_struct work; }; static inline void qdisc_cb_private_validate(const struct sk_buff *skb, int sz) --- a/net/sched/cls_api.c +++ b/net/sched/cls_api.c @@ -211,8 +211,12 @@ static void tcf_chain_flush(struct tcf_c static void tcf_chain_destroy(struct tcf_chain *chain) { + struct tcf_block *block = chain->block; + list_del(&chain->list); kfree(chain); + if (list_empty(&block->chain_list)) + kfree(block); } static void tcf_chain_hold(struct tcf_chain *chain) @@ -276,26 +280,12 @@ err_chain_create: } EXPORT_SYMBOL(tcf_block_get); -static void tcf_block_put_final(struct work_struct *work) -{ - struct tcf_block *block = container_of(work, struct tcf_block, work); - struct tcf_chain *chain, *tmp; - - rtnl_lock(); - - /* At this point, all the chains should have refcnt == 1. */ - list_for_each_entry_safe(chain, tmp, &block->chain_list, list) - tcf_chain_put(chain); - rtnl_unlock(); - kfree(block); -} - /* XXX: Standalone actions are not allowed to jump to any chain, and bound * actions should be all removed after flushing. */ void tcf_block_put(struct tcf_block *block) { - struct tcf_chain *chain; + struct tcf_chain *chain, *tmp; if (!block) return; @@ -310,12 +300,11 @@ void tcf_block_put(struct tcf_block *blo list_for_each_entry(chain, &block->chain_list, list) tcf_chain_flush(chain); - INIT_WORK(&block->work, tcf_block_put_final); - /* Wait for RCU callbacks to release the reference count and make - * sure their works have been queued before this. + /* At this point, all the chains should have refcnt >= 1. Block will be + * freed after all chains are gone. */ - rcu_barrier(); - tcf_queue_work(&block->work); + list_for_each_entry_safe(chain, tmp, &block->chain_list, list) + tcf_chain_put(chain); } EXPORT_SYMBOL(tcf_block_put);