Boris Sukholitko <boris.sukholitko@xxxxxxxxxxxx> wrote: > On Sun, May 07, 2023 at 07:37:58PM +0200, Florian Westphal wrote: > > Boris Sukholitko <boris.sukholitko@xxxxxxxxxxxx> wrote: > > > On Wed, May 3, 2023 at 9:46 PM Florian Westphal <fw@xxxxxxxxx> wrote: > > > > > > > > Boris Sukholitko <boris.sukholitko@xxxxxxxxxxxx> wrote: > > > [... snip to non working offload ...] > > > > > > > > table inet filter { > > > > > flowtable f1 { > > > > > hook ingress priority filter > > > > > devices = { veth0, veth1 } > > > > > } > > > > > > > > > > chain forward { > > > > > type filter hook forward priority filter; policy accept; > > > > > ip dscp set cs3 offload > > > > > ip protocol { tcp, udp, gre } flow add @f1 > > > > > ct state established,related accept > > > > > } > > > > > } > > > > > > [...] > > > > > > > > > > > I wish you would have reported this before you started to work on > > > > this, because this is not a bug, this is expected behaviour. > > > > > > > > Once you offload, the ruleset is bypassed, this is by design. > > > > > > From the rules UI perspective it seems possible to accelerate > > > forward chain handling with the statements such as dscp modification there. > > > > > > Isn't it better to modify the packets according to the bypassed > > > ruleset thus making the behaviour more consistent? > > > > The behaviour is consistent. Once flow is offloaded, ruleset is > > bypassed. Its easy to not offload those flows that need the ruleset. > > > > > > Lets not make the software offload more complex as it already is. > > > > > > Could you please tell which parts of software offload are too complex? > > > It's not too bad from what I've seen :) > > > > > > This patch series adds 56 lines of code in the new nf_conntrack.ext.c > > > file. 20 of them (nf_flow_offload_apply_payload) are used in > > > the software fast path. Is it too high of a price? > > > > 56 lines of code *now*. > > > > Next someone wants to call into sets/maps for named counters that > > they need. Then someone wants limit or quota to work. Then they want fib > > for RPF. Then xfrm policy matching to augment acccounting. > > This will go on until we get to the point where removing "fast" path > > turns into a performance optimization. > > OK. May I assume that you are concerned with the eventual performance impact > on the software fast path (i.e. nf_flow_offload_ip_hook)? Yes, but I also dislike the concept, see below. > Obviously the performance of the fast path is very important to our > customers. Otherwise they would not be requiring dscp fast path > modification. :) > > One of the things we've thought about regarding the fast path > performance is rewriting nf_flow_offload_ip_hook to work with > nf_flowtable->flow_block instead of flow_offload_tuple. Sorry, I should have expanded on my reservations towards this concept. Let me explain. Lets consider your original example first: ---------- table inet filter { flowtable f1 { hook ingress priority filter devices = { veth0, veth1 } } chain forward { type filter hook forward priority filter; policy accept; ip dscp set cs3 ip protocol { tcp, udp, gre } flow add ct state established,related accept } } ---------- This has a clearly defined meaning in all possible combinations. Software: 1. It defines a bypass for veth0 <-> veth1 2. the way this specific ruleset is defined, all of tcp/udp/gre will attempt to offload 3. once offload has happened, entire inet:forward may be bypassed 4. User ruleset needs to cope with packets being moved back to software: fragmented packets, tcp fin/rst, hw timeouts and so on. 5. User can control via 'offload' keyword if HW offload should be attempted or not Hardware: even 'nf_flow_offload_ip_hook' may be bypassed. But nothing changes compared to 'no hw offload' case from a conceptual point of view. Lets now consider existing netdev:ingress/egress in this same picture: (Example from Pablo): ------ table inet filter { flowtable f1 { hook ingress priority filter devices = { veth0, veth1 } } chain ingress { type filter hook ingress device veth0 priority filter; policy accept; flags offload; ip dscp set cs3 } chain forward { type filter hook forward priority filter; policy accept; meta l4proto { tcp, udp, gre } flow add @f1 ct state established,related accept } } Again, this has defined meaning in all combinations: With HW offload: veth0 will be told to mangle dscp. This happens in all cases and for every matching packet, regardless if a flowtable exists or not. Same would happen for 'egress', just that it would happen at xmit time rather at receive time. Again, its not relevant if there is active flowtable or not, or if data path is offloaded to hardware, to software, handled by fallback or entirely without flowtables being present. Its also clear that this is tied to 'veth0', other devices will not be involved and not do such mangling. Now lets look at your proposal: ---------------- table inet filter { flowtable f1 { hook ingress priority filter devices = { veth0, veth1 } } chain forward { type filter hook forward priority filter; policy accept; ip dscp set cs3 offload ip protocol { tcp, udp, gre } flow add ct state established,related accept } } ---------------- This means that software flowtable offload shall do a 'ip dscp set cs3'. What if the flowtable is offloaded to hardware entirely, without software fallback? What if the devices listed in the flowtable definition can handle flow offload, but no payload mangling? Does the 'offload' mean that the rule is only active for software path? Only for hardware path? both? How can I tell if its offloaded to hardware for one device but not for the other? Or will that be disallowed? What if someone adds another rule after or before 'ip dscp', but without the 'offload' keyword? Now ordering becomes an issue. Users now need to consider different control flows: jump exceptions ip dscp set cs3 offload chain exceptions { ip daddr 1.2.3.4 accept } This won't work as expected, because offloaded flows will not pass through 'forward' chain but somehow a few selected rules will be run anyway. TL;DR: I think that for HW offload its paramount to make it crystal clear as to which device is responsible to handle such rules. The existing netdev:ingress/egress hooks provide the needed chain/rules/expression:device mapping. User can easily tell if HW is responsible or SW by looking for 'offload' flag presence. I don't think mixing software and hardware offload contexts as proposed is a good idea, both from user frontend syntax, clarity and error reporting (e.g. if hw rejects offload request) point of view. I also believe that allowing payload mangling from *software* offload path sets a precedence for essentially allowing all other expressions again which completely negates the flowtable concept. I still think that dscp mangling should be done via netdev:ingress/egress hooks, I don't see why this has to be bolted into flowtable sw offload.