On Fri, May 13, 2022 at 11:09:51AM +0200, Felix Fietkau wrote: > > On 13.05.22 10:15, Pablo Neira Ayuso wrote: > > On Fri, May 13, 2022 at 10:03:13AM +0200, Felix Fietkau wrote: > > > > > > On 13.05.22 09:49, Pablo Neira Ayuso wrote: > > > > Hi, > > > > > On Tue, May 10, 2022 at 10:27:39PM +0200, Felix Fietkau wrote: > > > > > In many cases, it's not easily possible for user space to know, which > > > > > devices properly support hardware offload. > > > > > Then, it is a matter of extending the netlink interface to > > > expose this > > > > feature? Probably add a FLOW_BLOCK_PROBE or similar which allow to > > > > consult if this feature is available? > > > > > > Even if a device supports hardware flow offload, it is not > > > > > guaranteed that it will actually be able to handle the flows for > > > > > which hardware offload is requested. > > > > > When might this happen? > > > > > > I think there are many possible reasons: The flow might be using features > > > not supported by the offload driver. Maybe it doesn't have any space left in > > > the offload table. I'm sure there are many other possible reasons it could > > > fail. > > > > This fallback to software flowtable path for partial scenarios already > > exists. > I know. All I meant was to point out that hardware offload is not guaranteed > in one place, so I don't think bailing out with an error because flow block > bind didn't work for one of the flowtable devices is justified. > > > > > > Ignoring errors on the FLOW_BLOCK_BIND makes it a lot easier to set up > > > > > configurations that use hardware offload where possible and gracefully > > > > > fall back to software offload for everything else. > > > > > I understand this might be useful from userspace perspective, > > > because > > > > forcing the user to re-try is silly. > > > > > However, on the other hand, the user should have some way to > > > know from > > > > the control plane that the feature (hardware offload) that they > > > > request is not available for their setup. > > > > > > In my opinion, most users of this API probably don't care and just want to > > > have offload on a best effort basis. > > > > OK, but if the setup does not support hardware offload at all, why > > should the control plane accept this? I think user should know in > > first place that no one single flow is going to be offloaded to > > hardware. > > It makes for a much cleaner configuration if you can just create a single > hw-offload enabled flowtable containing multiple devices, some of which > support hardware offload and some of which don't. This scenario mixing devices that support hw offload and which does not support hw offload make sense. > > > Assuming that is the case, wouldn't it be better if we simply have > > > an API that indicates, which flowtable members hardware offload was > > > actually enabled for? > > > > What are you proposing? > > > > I think it would be good to expose through netlink interface what the > > device can actually do according to the existing supported flowtable > > software datapath features. > In addition to the NFTA_FLOWTABLE_HOOK_DEVS array, the netlink API could > also return another array, e.g. NFTA_FLOWTABLE_HOOK_OFFLOAD_DEVS which > indicates devices for which hw offload is enabled. > > What I really don't like about the current state of the flowtable offload > API is the (in my opinion completely unnecessary) complexity that is > required for the simple use case of enabling hw/sw flow offloading on a best > effort basis for all devices. > What I like even less is the number of implementation details that it has to > consider. > > For example: Let's assume we have a machine with several devices, some of > which support hw offload, some of which don't. We have a mix of VLANs and > bridges in there as well, maybe even PPPoE. > Now the admin of that machine wants to enable best-effort hardware + > software flow offloading for that configuration. > Now he (or a piece of user space software dealing with the config) has to do > these things: > - figure out which devices could support hw offload, create a separate flow > table for them > - be aware of which of these devices are actually used by looking at the > stack of bridges, vlans, dsa devices, etc. > - if an error occurs, test them individually just to see which one actually > failed and leave it out of the flowtable > - for sw offload be aware that there is limited support for offloading decap > of vlans/pppoe, count the number of decaps and figure out the right input > device to add based on the behavior of nft_dev_path_info, so that the > 'indev' it selects matches the device you put in the flow table. > > So I'm asking you: Am I getting any of this completely wrong? Do you > consider it to be a reasonable trade-off to force the admin (or intermediate > user space layer) to jump through these hoops for such a simple use case, > just because somebody might want more fine grained control? > > I consider this patch to be a first step towards making simple use cases > easier to configure. I'd also be fine with adding a flag to make the > fallback behavior opt-in, even though I think it would make a much better > default. > > Eventually I'd also like to add a flag that makes it unnecessary to even > specify the devices in the flow table by making the code auto-create hooks > for devices with active flows, just like I did in my xtables target. > You correctly pointed out to me in the past that this comes at the cost of a > few packets delay before offloading kicks in, but I'm still wondering: who > actually cares about that? > > If I'm completely off-base with this, please let me know. I'm simply trying > to make sense of all of this... Maybe only fail if _none_ of the selected devices support for hardware offload, ie. instead of silently accepting all devices, count of the number of devices for which a block has been set up, if it is == 0 then bail out with EOPNOTSUPP.