Currently, whenever act_ct tries to match a packet against the flow table, it will also try to refresh the offload. That is, at the end of tcf_ct_flow_table_lookup() it will call flow_offload_refresh(). The problem is that flow_offload_refresh() will try to offload entries that are actually already offloaded, leading to expensive and useless work. Before this patch, with a simple iperf3 test on OVS + TC (hw_offload=true) + CT test entirely in sw, it looks like: - 39,81% tcf_classify - fl_classify - 37,09% tcf_action_exec + 33,18% tcf_mirred_act - 2,69% tcf_ct_act - 2,39% tcf_ct_flow_table_lookup - 1,67% queue_work_on - 1,52% __queue_work 1,20% try_to_wake_up + 0,80% tcf_pedit_act + 2,28% fl_mask_lookup The patch here aborts the add operation if the entry is already present in hw. With this patch, then: - 43,94% tcf_classify - fl_classify - 39,64% tcf_action_exec + 38,00% tcf_mirred_act - 1,04% tcf_ct_act 0,63% tcf_ct_flow_table_lookup + 3,19% fl_mask_lookup Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> --- net/netfilter/nf_flow_table_offload.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644 --- a/net/netfilter/nf_flow_table_offload.c +++ b/net/netfilter/nf_flow_table_offload.c @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable, { struct flow_offload_work *offload; + if (test_bit(NF_FLOW_HW, &flow->flags)) + return; + offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE); if (!offload) return; -- 2.35.3