On 6/28/2022 6:25 PM, Marcelo Ricardo Leitner wrote:
On Tue, Jun 28, 2022 at 04:13:05PM +0300, Oz Shlomo wrote:
Hi Marcelo,
On 6/27/2022 8:06 PM, Marcelo Ricardo Leitner wrote:
Hi Oz,
On Mon, Jun 27, 2022 at 06:19:54PM +0300, Oz Shlomo wrote:
Hi Marcelo,
On 6/27/2022 5:38 PM, Marcelo Ricardo Leitner wrote:
Currently, whenever act_ct tries to match a packet against the flow
table, it will also try to refresh the offload. That is, at the end
of tcf_ct_flow_table_lookup() it will call flow_offload_refresh().
The problem is that flow_offload_refresh() will try to offload entries
that are actually already offloaded, leading to expensive and useless
work. Before this patch, with a simple iperf3 test on OVS + TC
Packets of offloaded connections are expected to process in hardware.
As such, it is not expected to receive packets in software from offloaded
connections.
However, hardware offload may fail due to various reasons (e.g. size limits,
insertion rate throttling etc.).
The "refresh" mechanism is the enabler for offload retries.
Thanks for the quick review.
Right. I don't mean to break this mechanism. Act_ct can also be used
in semi/pure sw datapath, and then the premise of packets being
expected to be handled in hw is not valid anymore. I can provide a
more detailed use case if you need.
It is clear that the refresh design introduces some overhead when act_ct is
used in a pure sw datapath.
Cool.
(hw_offload=true) + CT test entirely in sw, it looks like:
- 39,81% tcf_classify
- fl_classify
- 37,09% tcf_action_exec
+ 33,18% tcf_mirred_act
- 2,69% tcf_ct_act
- 2,39% tcf_ct_flow_table_lookup
- 1,67% queue_work_on
- 1,52% __queue_work
1,20% try_to_wake_up
+ 0,80% tcf_pedit_act
+ 2,28% fl_mask_lookup
The patch here aborts the add operation if the entry is already present
in hw. With this patch, then:
- 43,94% tcf_classify
- fl_classify
- 39,64% tcf_action_exec
+ 38,00% tcf_mirred_act
- 1,04% tcf_ct_act
0,63% tcf_ct_flow_table_lookup
+ 3,19% fl_mask_lookup
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@xxxxxxxxx>
---
net/netfilter/nf_flow_table_offload.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/net/netfilter/nf_flow_table_offload.c b/net/netfilter/nf_flow_table_offload.c
index 11b6e19420920bc8efda9877af0dab5311c8a096..9a8fc61581400b4e13aa356972d366892bb71b9b 100644
--- a/net/netfilter/nf_flow_table_offload.c
+++ b/net/netfilter/nf_flow_table_offload.c
@@ -1026,6 +1026,9 @@ void nf_flow_offload_add(struct nf_flowtable *flowtable,
{
struct flow_offload_work *offload;
+ if (test_bit(NF_FLOW_HW, &flow->flags))
+ return;
+
This change will make the refresh call obsolete as the NF_FLOW_HW bit is set
on the first flow offload attempt.
Oh oh.. I was quite sure it was getting cleared when the entry was
removed from HW, but not.
So instead of the if() above, what about:
+ if (test_and_set_bit(IPS_HW_OFFLOAD_BIT, &flow->ct->status))
I think this will set the IPS_HW_OFFLOAD_BIT prematurely.
Currently this bit is set only when the the flow has been successfully
offloaded.
Indeed. It was my cat that typed _and_set in there. ;-] (joking) sorry.
AFAICT it will keep trying while the entry is not present in the flow
table, and stop while it is there. Once the entry is aged from HW, it
is also removed from the flow table, so this part should be okay.
But if the offload failed for some reason like you said above, and the
entry is left on the flow table, it won't retry until it ages out from
the flow table.
But then we will never have the chance to re-install it in hardware while
the connection is still active.
Unless it goes idle, yes.
If you expect that this situation can be easily triggered, we may need
to add a rate limit instead then. Even for these connections that
failed to offload, this "busy retrying" is expensive and may backfire
in such situation.
Perhaps we can refresh only if the flow_block callbacks list is not empty.
It may not be empty even for sw datapath. If you have packets coming
from the wire towards a veth/virtio, for example. It will likely
having a matching act_ct with the same zone number on both directions.
And/or if a zone is shared, as the the flow table then is also shared.
Hmm, I was thinking that you are targeting the use case of deployments
with no ct offload supporting hardware.
CT action is usually proceeded by a goto action. So, a filter with ct
and goto action list will offload even if the last chain forwards to a
veth/virtio device.
offload = nf_flow_offload_work_alloc(flowtable, flow, FLOW_CLS_REPLACE);
if (!offload)
return;