Re: [PATCH nf-next] flow_table: do not try to add already offloaded entries

Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx> · Mon, 27 Jun 2022 14:06:45 -0300

Hi Oz,

On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote:
> Hi Marcelo,
> 
> On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
> > Currently, whenever act_ct tries to match a packet against the flow
> > table, it will also try to refresh the offload. That is, at the end
> > of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
> > 
> > The problem is that flow_offload_refresh() will try to offload entries
> > that are actually already offloaded, leading to expensive and useless
> > work. Before this patch, with a simple iperf3 test on OVS + TC
> 
> Packets of offloaded connections are expected to process in hardware.
> As such, it is not expected to receive packets in software from offloaded
> connections.
> 
> However, hardware offload may fail due to various reasons (e.g. size limits,
> insertion rate throttling etc.).
> The "refresh" mechanism is the enabler for offload retries.

Thanks for the quick review.

Right. I don't mean to break this mechanism. Act_ct can also be used
in semi/pure sw datapath, and then the premise of packets being
expected to be handled in hw is not valid anymore. I can provide a
more detailed use case if you need.

> 
> 
> > (hw_offload=true) + CT test entirely in sw, it looks like:
> > 
> > - 39,81% tcf_classify
> >     - fl_classify
> >        - 37,09% tcf_action_exec
> >           + 33,18% tcf_mirred_act
> >           - 2,69% tcf_ct_act
> >              - 2,39% tcf_ct_flow_table_lookup
> >                 - 1,67% queue_work_on
> >                    - 1,52% __queue_work
> >                         1,20% try_to_wake_up
> >           + 0,80% tcf_pedit_act
> >        + 2,28% fl_mask_lookup
> > 
> > The patch here aborts the add operation if the entry is already present
> > in hw. With this patch, then:
> > 
> > - 43,94% tcf_classify
> >     - fl_classify
> >        - 39,64% tcf_action_exec
> >           + 38,00% tcf_mirred_act
> >           - 1,04% tcf_ct_act
> >                0,63% tcf_ct_flow_table_lookup
> >        + 3,19% fl_mask_lookup
> > 
> > Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx>
> > ---
> >   net/netfilter/nf_flow_table_offload.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
> > index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
> > --- a/net/netfilter/nf_flow_table_offload.c
> > +++ b/net/netfilter/nf_flow_table_offload.c
> > @@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
> >   {
> >   	struct flow_offload_work *offload;
> > +	if (test_bit(NF_FLOW_HW, &flow->flags))
> > +		return;
> > +
> 
> This change will make the refresh call obsolete as the NF_FLOW_HW bit is set
> on the first flow offload attempt.

Oh oh.. I was quite sure it was getting cleared when the entry was
removed from HW, but not.

So instead of the if() above, what about:
+       if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status))

AFAICT it will keep trying while the entry is not present in the flow
table, and stop while it is there. Once the entry is aged from HW, it
is also removed from the flow table, so this part should be okay. 
But if the offload failed for some reason like you said above, and the
entry is left on the flow table, it won't retry until it ages out from
the flow table.

If you expect that this situation can be easily triggered, we may need
to add a rate limit instead then. Even for these connections that
failed to offload, this "busy retrying" is expensive and may backfire
in such situation.

> 
> >   	offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
> >   	if (!offload)
> >   		return;