On Tue, Jun 28, 2022 at 04:13:05PM +0300, Oz Shlomo wrote: > Hi Marcelo, > > > On 6/27/2022 8:06 PM, Marcelo Ricardo Leitner wrote: > > Hi Oz, > > > > On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote: > > > Hi Marcelo, > > > > > > On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote: > > > > Currently, whenever act_ct tries to match a packet against the flow > > > > table, it will also try to refresh the offload. That is, at the end > > > > of tcf_ct_flow_table_lookup() it will call flow_offload_refresh(). > > > > > > > > The problem is that flow_offload_refresh() will try to offload entries > > > > that are actually already offloaded, leading to expensive and useless > > > > work. Before this patch, with a simple iperf3 test on OVS + TC > > > > > > Packets of offloaded connections are expected to process in hardware. > > > As such, it is not expected to receive packets in software from offloaded > > > connections. > > > > > > However, hardware offload may fail due to various reasons (e.g. size limits, > > > insertion rate throttling etc.). > > > The "refresh" mechanism is the enabler for offload retries. > > > > Thanks for the quick review. > > > > Right. I don't mean to break this mechanism. Act_ct can also be used > > in semi/pure sw datapath, and then the premise of packets being > > expected to be handled in hw is not valid anymore. I can provide a > > more detailed use case if you need. > > It is clear that the refresh design introduces some overhead when act_ct is > used in a pure sw datapath. Cool. > > > > > > > > > > > > > (hw_offload=true) + CT test entirely in sw, it looks like: > > > > > > > > - 39,81% tcf_classify > > > > - fl_classify > > > > - 37,09% tcf_action_exec > > > > + 33,18% tcf_mirred_act > > > > - 2,69% tcf_ct_act > > > > - 2,39% tcf_ct_flow_table_lookup > > > > - 1,67% queue_work_on > > > > - 1,52% __queue_work > > > > 1,20% try_to_wake_up > > > > + 0,80% tcf_pedit_act > > > > + 2,28% fl_mask_lookup > > > > > > > > The patch here aborts the add operation if the entry is already present > > > > in hw. With this patch, then: > > > > > > > > - 43,94% tcf_classify > > > > - fl_classify > > > > - 39,64% tcf_action_exec > > > > + 38,00% tcf_mirred_act > > > > - 1,04% tcf_ct_act > > > > 0,63% tcf_ct_flow_table_lookup > > > > + 3,19% fl_mask_lookup > > > > > > > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> > > > > --- > > > > net/netfilter/nf_flow_table_offload.c | 3 +++ > > > > 1 file changed, 3 insertions(+) > > > > > > > > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c > > > > index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644 > > > > --- a/net/netfilter/nf_flow_table_offload.c > > > > +++ b/net/netfilter/nf_flow_table_offload.c > > > > @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable, > > > > { > > > > struct flow_offload_work *offload; > > > > + if (test_bit(NF_FLOW_HW, &flow->flags)) > > > > + return; > > > > + > > > > > > This change will make the refresh call obsolete as the NF_FLOW_HW bit is set > > > on the first flow offload attempt. > > > > Oh oh.. I was quite sure it was getting cleared when the entry was > > removed from HW, but not. > > > > So instead of the if() above, what about: > > + if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status)) > > I think this will set the IPS_HW_OFFLOAD_BIT prematurely. > Currently this bit is set only when the the flow has been successfully > offloaded. Indeed. It was my cat that typed _and_set in there. ;-] (joking) sorry. > > > > > AFAICT it will keep trying while the entry is not present in the flow > > table, and stop while it is there. Once the entry is aged from HW, it > > is also removed from the flow table, so this part should be okay. > > But if the offload failed for some reason like you said above, and the > > entry is left on the flow table, it won't retry until it ages out from > > the flow table. > > But then we will never have the chance to re-install it in hardware while > the connection is still active. Unless it goes idle, yes. > > > > > If you expect that this situation can be easily triggered, we may need > > to add a rate limit instead then. Even for these connections that > > failed to offload, this "busy retrying" is expensive and may backfire > > in such situation. > > Perhaps we can refresh only if the flow_block callbacks list is not empty. It may not be empty even for sw datapath. If you have packets coming from the wire towards a veth/virtio, for example. It will likely having a matching act_ct with the same zone number on both directions. And/or if a zone is shared, as the the flow table then is also shared. > > > > > > > > > > > offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE); > > > > if (!offload) > > > > return;