Re: [PATCH nf-next 00/19] netfilter: nftables: dscp modification offload

Boris Sukholitko <boris.sukholitko@xxxxxxxxxxxx> · Mon, 8 May 2023 16:38:06 +0300

On Sun, May 07, 2023 at 07:37:58PM +0200, Florian Westphal wrote:
> Boris Sukholitko <boris.sukholitko@xxxxxxxxxxxx> wrote:
> > On Wed, May 3, 2023 at 9:46 PM Florian Westphal <fw@xxxxxxxxx> wrote:
> > >
> > > Boris Sukholitko <boris.sukholitko@xxxxxxxxxxxx> wrote:
> > [... snip to non working offload ...]
> > 
> > > > table inet filter {
> > > >         flowtable f1 {
> > > >                 hook ingress priority filter
> > > >                 devices = { veth0, veth1 }
> > > >         }
> > > >
> > > >         chain forward {
> > > >                 type filter hook forward priority filter; policy accept;
> > > >                 ip dscp set cs3 offload
> > > >                 ip protocol { tcp, udp, gre } flow add @f1
> > > >                 ct state established,related accept
> > > >         }
> > > > }
> > 
> > [...]
> > 
> > >
> > > I wish you would have reported this before you started to work on
> > > this, because this is not a bug, this is expected behaviour.
> > >
> > > Once you offload, the ruleset is bypassed, this is by design.
> > 
> > From the rules UI perspective it seems possible to accelerate
> > forward chain handling with the statements such as dscp modification there.
> > 
> > Isn't it better to modify the packets according to the bypassed
> > ruleset thus making the behaviour more consistent?
> 
> The behaviour is consistent.  Once flow is offloaded, ruleset is
> bypassed.  Its easy to not offload those flows that need the ruleset.
> 
> > > Lets not make the software offload more complex as it already is.
> > 
> > Could you please tell which parts of software offload are too complex?
> > It's not too bad from what I've seen :)
> > 
> > This patch series adds 56 lines of code in the new nf_conntrack.ext.c
> > file. 20 of them (nf_flow_offload_apply_payload) are used in
> > the software fast path. Is it too high of a price?
> 
> 56 lines of code *now*.
> 
> Next someone wants to call into sets/maps for named counters that
> they need.  Then someone wants limit or quota to work.  Then they want fib
> for RPF.  Then xfrm policy matching to augment acccounting.
> This will go on until we get to the point where removing "fast" path
> turns into a performance optimization.

OK. May I assume that you are concerned with the eventual performance impact
on the software fast path (i.e. nf_flow_offload_ip_hook)?

Obviously the performance of the fast path is very important to our
customers. Otherwise they would not be requiring dscp fast path
modification. :)

One of the things we've thought about regarding the fast path
performance is rewriting nf_flow_offload_ip_hook to work with
nf_flowtable->flow_block instead of flow_offload_tuple.

We hope that iterating over flow_action_entry list similar to what the
hardware acceleration does, will be more efficient also in software.

Nice side-effect of such optimization would be that the amount of
feature bloat (such as dscp modification!) will not affect your typical
connection unless the user actually uses them.

For example, for dscp payload modification we'll generate
FLOW_ACTION_MANGLE entry. This entry will appear on flow_block's of
the only connections which require it. Others will be uneffected.

Would you be ok with such direction (with performance tests of
course)? 

Thanks,
Boris.

> 
> Existing rule hw offload via netdev:ingress makes it clear
> what rules are offloaded and to which device and it augments
> flowtable feature regardless if thats handled by software fastpath,
> software fallback/slowpath or by hardware offload.
> 
> > > If you want to apply dscp payload modification, do not use flowtable
> > > offload or hook those parts at netdev:ingress, it will be called before the
> > > software offload pipeline.
> > >
> > 
> > The problem is that our customers need to apply dscp modification in
> > more complex scenarios, e.g. after NAT.
> > Therefore I am not sure that ingress chain is enough for them.
> 
> I don't understand why this would have to occur after nat, but
> netdev:egress exists as well.
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature