Re: [PATCH nf-next] flow_table: do not try to add already offloaded entries

Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> · Tue, 28 Jun 2022 12:25:02 -0300

On Tue, Jun 28, 2022 at 04:13:05PM +0300, Oz Shlomo wrote:
> Hi Marcelo,
> 
> 
> On 6/27/2022 8:06 PM, Marcelo Ricardo Leitner wrote:
> > Hi Oz,
> > 
> > On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote:
> > > Hi Marcelo,
> > > 
> > > On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
> > > > Currently, whenever act_ct tries to match a packet against the flow
> > > > table, it will also try to refresh the offload. That is, at the end
> > > > of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
> > > > 
> > > > The problem is that flow_offload_refresh() will try to offload entries
> > > > that are actually already offloaded, leading to expensive and useless
> > > > work. Before this patch, with a simple iperf3 test on OVS + TC
> > > 
> > > Packets of offloaded connections are expected to process in hardware.
> > > As such, it is not expected to receive packets in software from offloaded
> > > connections.
> > > 
> > > However, hardware offload may fail due to various reasons (e.g. size limits,
> > > insertion rate throttling etc.).
> > > The "refresh" mechanism is the enabler for offload retries.
> > 
> > Thanks for the quick review.
> > 
> > Right. I don't mean to break this mechanism. Act_ct can also be used
> > in semi/pure sw datapath, and then the premise of packets being
> > expected to be handled in hw is not valid anymore. I can provide a
> > more detailed use case if you need.
> 
> It is clear that the refresh design introduces some overhead when act_ct is
> used in a pure sw datapath.

Cool.

> 
> > 
> > > 
> > > 
> > > > (hw_offload=true) + CT test entirely in sw, it looks like:
> > > > 
> > > > - 39,81% tcf_classify
> > > >      - fl_classify
> > > >         - 37,09% tcf_action_exec
> > > >            + 33,18% tcf_mirred_act
> > > >            - 2,69% tcf_ct_act
> > > >               - 2,39% tcf_ct_flow_table_lookup
> > > >                  - 1,67% queue_work_on
> > > >                     - 1,52% __queue_work
> > > >                          1,20% try_to_wake_up
> > > >            + 0,80% tcf_pedit_act
> > > >         + 2,28% fl_mask_lookup
> > > > 
> > > > The patch here aborts the add operation if the entry is already present
> > > > in hw. With this patch, then:
> > > > 
> > > > - 43,94% tcf_classify
> > > >      - fl_classify
> > > >         - 39,64% tcf_action_exec
> > > >            + 38,00% tcf_mirred_act
> > > >            - 1,04% tcf_ct_act
> > > >                 0,63% tcf_ct_flow_table_lookup
> > > >         + 3,19% fl_mask_lookup
> > > > 
> > > > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx>
> > > > ---
> > > >    net/netfilter/nf_flow_table_offload.c | 3 +++
> > > >    1 file changed, 3 insertions(+)
> > > > 
> > > > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> > > > index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
> > > > --- a/net/netfilter/nf_flow_table_offload.c
> > > > +++ b/net/netfilter/nf_flow_table_offload.c
> > > > @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
> > > >    {
> > > >    	struct flow_offload_work *offload;
> > > > +	if (test_bit(NF_FLOW_HW, &flow->flags))
> > > > +		return;
> > > > +
> > > 
> > > This change will make the refresh call obsolete as the NF_FLOW_HW bit is set
> > > on the first flow offload attempt.
> > 
> > Oh oh.. I was quite sure it was getting cleared when the entry was
> > removed from HW, but not.
> > 
> > So instead of the if() above, what about:
> > +       if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status))
> 
> I think this will set the IPS_HW_OFFLOAD_BIT prematurely.
> Currently this bit is set only when the the flow has been successfully
> offloaded.

Indeed. It was my cat that typed _and_set in there. ;-] (joking) sorry.

> 
> > 
> > AFAICT it will keep trying while the entry is not present in the flow
> > table, and stop while it is there. Once the entry is aged from HW, it
> > is also removed from the flow table, so this part should be okay.
> > But if the offload failed for some reason like you said above, and the
> > entry is left on the flow table, it won't retry until it ages out from
> > the flow table.
> 
> But then we will never have the chance to re-install it in hardware while
> the connection is still active.

Unless it goes idle, yes.

> 
> > 
> > If you expect that this situation can be easily triggered, we may need
> > to add a rate limit instead then. Even for these connections that
> > failed to offload, this "busy retrying" is expensive and may backfire
> > in such situation.
> 
> Perhaps we can refresh only if the flow_block callbacks list is not empty.

It may not be empty even for sw datapath. If you have packets coming
from the wire towards a veth/virtio, for example. It will likely
having a matching act_ct with the same zone number on both directions.
And/or if a zone is shared, as the the flow table then is also shared.

> 
> 
> > 
> > > 
> > > >    	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
> > > >    	if (!offload)
> > > >    		return;