On Wed, Jun 19, 2024 at 04:48:35PM +0200, Pablo Neira Ayuso wrote: > On Wed, Jun 19, 2024 at 01:27:06PM +0200, Phil Sutter wrote: > > On Tue, Jun 18, 2024 at 01:10:21AM +0200, Pablo Neira Ayuso wrote: > > > Hi Phil, > > > > > > On Fri, May 17, 2024 at 03:06:08PM +0200, Phil Sutter wrote: > > > > Changes since v1: > > > > - New patch 6 adding notifications for updated hooks. > > > > - New patch 7 adding the requested torture test. > > > > > > > > Currently, netdev-family chains and flowtables expect their interfaces > > > > to exist at creation time. In practice, this bites users of virtual > > > > interfaces if these happen to be created after the nftables service > > > > starts up and loads the stored ruleset. > > > > > > > > Vice-versa, if an interface disappears at run-time (via module unloading > > > > or 'ip link del'), it also disappears from the ruleset, along with the > > > > chain and its rules which binds to it. This is at least problematic for > > > > setups which store the running ruleset during system shutdown. > > > > > > I'd suggest that you place your patch 2/7 to modify the existing > > > behaviour in first place. > > > > I did not to avoid "dead" entries in the hook list (with ops.dev gone, > > there's nothing left identifying the hook). But it's only temporary and > > maybe cleans up the diffs, will give it a go. > > OK. > > > > > This series attempts to solve these problems by effectively making > > > > netdev hooks name-based: If no matching interface is found at hook > > > > creation time, it will be inactive until a matching interface appears. > > > > If a bound interface is renamed, a matching inactive hook is searched > > > > for it. > > > > > > > > Ruleset dumps will stabilize in that regard. To still provide > > > > information about which existing interfaces a chain/flowtable currently > > > > binds to, new netlink attributes *_ACT_DEVS are introduced which are > > > > filled from the active hooks only. > > > > > > Currently, NFTA_HOOK_DEVS already represents the netdevice that are > > > active. If one of these devices goes aways, then it is removed from > > > the basechain and it does not show up in NFTA_HOOK_DEVS anymore. > > > > > > There are netlink notifications that need to fit into NLMSG_GOODSIZE, > > > but this adds yet another netlink array attribute. > > > > Hmm. I could introduce NFTA_HOOK_INACTIVE_DEVS which contains only those > > entries missing in NFTA_HOOK_DEVS. This shouldn't bloat the dumps too > > much (apart from the added overhead) and won't change old user space > > behaviour. > > Not sure. What does NFTA_HOOK_INACTIVE_DEVS contains? Could you > provide an example? Again, if this array gets again too large, there > could be issues with NLMSG_GOODSIZE again for notifications. I assumed your intention was for NFTA_HOOK_DEVS to not change semantically, i.e. remain to contain only devices which are present at time of the dump. Then I could introduce INACTIVE_DEVS to contain those we lost meanwhile. As an example: 1) add netdev chain for devices eth0, eth1, eth2 2) list ruleset: - HOOK_DEVS = { eth0, eth1, eth2 } - INACTIVE_DEVS = {} 3) ip link del eth1 4) list ruleset: - HOOK_DEVS = { eth0, eth2 } - INACTIVE_DEVS = { eth1 } This avoids duplicate entries in both lists and thus avoids overhead. This would fix for the interfaces missing in dumps problem. Wildcards would appear as-is in either HOOK_DEVS (if there's at least one matching interface) or INACTIVE_DEVS (if there is none). The actual list of active interfaces would require a GETDEVICE call. > I would just display these active devices (in the list of devices that > are attached to this basechain) via the new command _GETDEV that we > are discussing below? These netdevices that match the pattern come and > go, I guess user only wants to make sure they are actually registered > to this hook for diagnostics, showing an exact match, ie. tap0, or > inexact match, ie. tap* should be should when listing the ruleset IMO. OK, let's see if I can sum this up correctly: 1) NFTA_HOOK_DEVS is changed to always reflect what the user specified 2) Interfaces being removed or added trigger NEWDEV/DELDEV notifications 3) Active hooks are dumped by GETDEV netlink request 4) NEWDEV/DELDEV netlink requests/responses added to cover for oversized chains/flowtables You're saying we have to use (4) for wildcard interfaces, too. Is this to keep them away from NFTA_HOOK_DEVS? Because in theory 1-3 are sufficient for wildcards, too. > > > I think we cannot escape adding new commands such as: > > > > > > NFT_MSG_NEWDEVICE > > > NFT_MSG_GETDEVICE > > > NFT_MSG_DELDEVICE > > > > > > to populate the basechain/flowtable, those can be used only if the > > > "subscription" mecanism is used, so older kernels still rely in > > > NFTA_HOOK_DEVS (older-older kernels actually already deal with > > > NFTA_HOOK_DEV only...). > > > > Why can't they co-exist? I.e., NFTA_HOOK_DEV{,S} continues to behave as > > before and NEWDEV/DELDEV modify the list of existing chains/flowtables. > > An explicit GETDEV to dump currently active interfaces seems reasonable, > > especially with wildcard specifiers in mind. > > yes, NFTA_HOOK_DEVS and these new commands need to coexist, there is > no other way to retain backward compatibility. > > > > NFT_MSG_NEWDEVICE can provide a flag to specify this is a wildcard > > > or exact matching. > > > > I had the same mechanism we use for wildcard ifname matching in mind > > which is to specify a name length which either includes or excludes the > > terminating NUL-char. Using strncmp() is sufficient then. > > That's fine. > > > Or do you think more advanced (e.g. "enp*s0") wildcards are useful? > > I prefer future does not bring "enp*s0", but let's agree on an API > that can be extended to accomodate new requirements, that's my only > comment in this regard. > > > > There is also a need for a netlink attribute to specify if this is > > > adding a device to a chain or flowtable. > > > > Since one has to specify the specific chain or flowtable to add to > > anyway, one could just add NFTA_DEV_CHAIN and NFTA_DEV_FLOWTABLE > > attributes, which are mutually exclusive and together with > > NFTA_DEV_TABLE identify the object to manipulate. > > Right. > > > > With these new commands, the NFT_NETDEVICE_MAX cap can also go away in > > > newer kernels with a command like this: > > > > > > nft add device flowtable ip x y { a, b, c } > > > nft add device chain ip x y { a, b, c } > > > > > > which expands to one one NFT_MSG_* message for each item. > > > > > > Then, the nested notation will need to detect what to use depending on > > > the user input: > > > > > > table netdev x { > > > chain y { > > > type filter hook ingress devices = { tap* } priority 0 > > > } > > > } > > > > > > this triggers the new commands path, same if the list of devices goes > > > over NFT_NETDEVICE_MAX (256). > > > > This is for user space to remain compatible to older kernels, right? > > Yes. > > > > This can also help incrementally report on new devices that match on a > > > given "subscription pattern" via new NFT_MSG_NEWDEVICE command for > > > monitoring purpose. > > > > Yes, adding NEWDEV/DELDEV notifications is a nice approach! > > This will also help to remove the existing NFT_NETDEVICE_MAX limit. > > > > Maybe a command like: > > > > > > nft list device chain ip x y > > > > > > could be used to list the existing devices that are "active" as per > > > your definition, so: > > > > > > nft list ruleset > > > > > > does not show the listing of "active" devices that expand to tap*, > > > because this is only informational (not really required to restore a > > > ruleset), I guess the user only wants this to inspect to make sure > > > what devices are registered to the given tap* pattern. > > > > ACK. > > > > > There is also another case that would need to be handled: > > > > > > - chain A with device tap0 > > > - chain B with wildcard device tap* > > > > > > I would expect a "exclude" clause for the wildcard case will come > > > sooner or later to define a different policy for a specify chain. > > > The new specific command approach would be extensible in that sense. > > > > As a first implementation, I would just forbid such combinations. > > Assuming "tap*" is just "tap" with length 3 and "tap0" is "tap0\0" with > > length 5, modifying the duplicate hook check in > > nf_tables_parse_netdev_hooks() to perform a strncmp(namea, nameb, > > min(lena, lenb)) should suffice. > > That is fine to ensure a given basechain does not have both tap0 and tap* Ah, I missed that nf_tables_parse_netdev_hooks() merely searches the current list for duplicate entries, not all chains'/flowtables' ones. Hmm. I can create multiple netdev chains attaching to the same interfaces, but only a single flowtable. Is this intentional? > > Another option avoiding maintenance of an exclusion list would be to > > just add a new interface to the first chain/flowtable with matching hook > > in the list. This should work since 'nft list ruleset' does not sort the > > ruleset, so 'add chain t b; add chain t a' is a meaningful difference to > > 'add chain t a; add chain t b'. > > > > The exclusion list approach either means tracking a list of active > > hooks' matching names in each wildcard hook or some preference when > > searching a hook for a new interface. All this may become increasingly > > complicated by stuff like one hook for 'eth*' and another for 'et*'. > > > > > > This series is also prep work for a simple wildcard interface binding > > > > similar to the wildcard interface matching in meta expression. It should > > > > suffice to turn struct nft_hook::ops into an array of all matching > > > > interfaces, but the respective code does not exist yet. > > > > > > I think this series is all about this wildcard interface matching, > > > which is not coming explicitly. > > > > My motivation for this series is an open ticket complaining about the > > inability to define a flowtable for an interface which is created by > > NetworkManager (which depends on nftables service for startup). > > > > A related (but not complained so far) issue is in reverse direction: If > > the interface vanishes before nftables service saves the ruleset upon > > shutdown, the dump will be incomplete. > > > > I consider wildcard interface hooks merely a (nice) side-effect from > > fixing the above. Which should not require too much additional work. > > OK. > > Thanks. >