On Tue, Jun 28, 2022 at 06:34:26PM +0300, Oz Shlomo wrote: > > > On 6/28/2022 6:25 PM, Marcelo Ricardo Leitner wrote: > > On Tue, Jun 28, 2022 at 04:13:05PM +0300, Oz Shlomo wrote: > > > Hi Marcelo, > > > > > > > > > On 6/27/2022 8:06 PM, Marcelo Ricardo Leitner wrote: > > > > Hi Oz, > > > > > > > > On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote: > > > > > Hi Marcelo, > > > > > > > > > > On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote: > > > > > > Currently, whenever act_ct tries to match a packet against the flow > > > > > > table, it will also try to refresh the offload. That is, at the end > > > > > > of tcf_ct_flow_table_lookup() it will call flow_offload_refresh(). > > > > > > > > > > > > The problem is that flow_offload_refresh() will try to offload entries > > > > > > that are actually already offloaded, leading to expensive and useless > > > > > > work. Before this patch, with a simple iperf3 test on OVS + TC > > > > > > > > > > Packets of offloaded connections are expected to process in hardware. > > > > > As such, it is not expected to receive packets in software from offloaded > > > > > connections. > > > > > > > > > > However, hardware offload may fail due to various reasons (e.g. size limits, > > > > > insertion rate throttling etc.). > > > > > The "refresh" mechanism is the enabler for offload retries. > > > > > > > > Thanks for the quick review. > > > > > > > > Right. I don't mean to break this mechanism. Act_ct can also be used > > > > in semi/pure sw datapath, and then the premise of packets being > > > > expected to be handled in hw is not valid anymore. I can provide a > > > > more detailed use case if you need. > > > > > > It is clear that the refresh design introduces some overhead when act_ct is > > > used in a pure sw datapath. > > > > Cool. > > > > > > > > > > > > > > > > > > > > > > > > > (hw_offload=true) + CT test entirely in sw, it looks like: > > > > > > > > > > > > - 39,81% tcf_classify > > > > > > - fl_classify > > > > > > - 37,09% tcf_action_exec > > > > > > + 33,18% tcf_mirred_act > > > > > > - 2,69% tcf_ct_act > > > > > > - 2,39% tcf_ct_flow_table_lookup > > > > > > - 1,67% queue_work_on > > > > > > - 1,52% __queue_work > > > > > > 1,20% try_to_wake_up > > > > > > + 0,80% tcf_pedit_act > > > > > > + 2,28% fl_mask_lookup > > > > > > > > > > > > The patch here aborts the add operation if the entry is already present > > > > > > in hw. With this patch, then: > > > > > > > > > > > > - 43,94% tcf_classify > > > > > > - fl_classify > > > > > > - 39,64% tcf_action_exec > > > > > > + 38,00% tcf_mirred_act > > > > > > - 1,04% tcf_ct_act > > > > > > 0,63% tcf_ct_flow_table_lookup > > > > > > + 3,19% fl_mask_lookup > > > > > > > > > > > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> > > > > > > --- > > > > > > net/netfilter/nf_flow_table_offload.c | 3 +++ > > > > > > 1 file changed, 3 insertions(+) > > > > > > > > > > > > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c > > > > > > index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644 > > > > > > --- a/net/netfilter/nf_flow_table_offload.c > > > > > > +++ b/net/netfilter/nf_flow_table_offload.c > > > > > > @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable, > > > > > > { > > > > > > struct flow_offload_work *offload; > > > > > > + if (test_bit(NF_FLOW_HW, &flow->flags)) > > > > > > + return; > > > > > > + > > > > > > > > > > This change will make the refresh call obsolete as the NF_FLOW_HW bit is set > > > > > on the first flow offload attempt. > > > > > > > > Oh oh.. I was quite sure it was getting cleared when the entry was > > > > removed from HW, but not. > > > > > > > > So instead of the if() above, what about: > > > > + if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status)) > > > > > > I think this will set the IPS_HW_OFFLOAD_BIT prematurely. > > > Currently this bit is set only when the the flow has been successfully > > > offloaded. > > > > Indeed. It was my cat that typed _and_set in there. ;-] (joking) sorry. > > > > > > > > > > > > > AFAICT it will keep trying while the entry is not present in the flow > > > > table, and stop while it is there. Once the entry is aged from HW, it > > > > is also removed from the flow table, so this part should be okay. > > > > But if the offload failed for some reason like you said above, and the > > > > entry is left on the flow table, it won't retry until it ages out from > > > > the flow table. > > > > > > But then we will never have the chance to re-install it in hardware while > > > the connection is still active. > > > > Unless it goes idle, yes. > > > > > > > > > > > > > If you expect that this situation can be easily triggered, we may need > > > > to add a rate limit instead then. Even for these connections that > > > > failed to offload, this "busy retrying" is expensive and may backfire > > > > in such situation. > > > > > > Perhaps we can refresh only if the flow_block callbacks list is not empty. > > > > It may not be empty even for sw datapath. If you have packets coming > > from the wire towards a veth/virtio, for example. It will likely > > having a matching act_ct with the same zone number on both directions. > > And/or if a zone is shared, as the the flow table then is also shared. > > Hmm, I was thinking that you are targeting the use case of deployments with > no ct offload supporting hardware. Ahh. > > CT action is usually proceeded by a goto action. So, a filter with ct and > goto action list will offload even if the last chain forwards to a > veth/virtio device. Right, but there's the outbound traffic. Traffic coming from a container that is using veth, and going out to the wire. And also traffic from one container to another in the same host, that can also trigger this, if at least one of them is not annotated to use the offloading. These will be using ingress tc filters on veth interfaces, which won't offload. > > > > > > > > > > > > > > > > > > > > > > > > offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE); > > > > > > if (!offload) > > > > > > return;