On Tue, Jun 18, 2024 at 01:10:21AM +0200, Pablo Neira Ayuso wrote: > Hi Phil, > > On Fri, May 17, 2024 at 03:06:08PM +0200, Phil Sutter wrote: > > Changes since v1: > > - New patch 6 adding notifications for updated hooks. > > - New patch 7 adding the requested torture test. > > > > Currently, netdev-family chains and flowtables expect their interfaces > > to exist at creation time. In practice, this bites users of virtual > > interfaces if these happen to be created after the nftables service > > starts up and loads the stored ruleset. > > > > Vice-versa, if an interface disappears at run-time (via module unloading > > or 'ip link del'), it also disappears from the ruleset, along with the > > chain and its rules which binds to it. This is at least problematic for > > setups which store the running ruleset during system shutdown. > > I'd suggest that you place your patch 2/7 to modify the existing > behaviour in first place. I did not to avoid "dead" entries in the hook list (with ops.dev gone, there's nothing left identifying the hook). But it's only temporary and maybe cleans up the diffs, will give it a go. > > This series attempts to solve these problems by effectively making > > netdev hooks name-based: If no matching interface is found at hook > > creation time, it will be inactive until a matching interface appears. > > If a bound interface is renamed, a matching inactive hook is searched > > for it. > > > > Ruleset dumps will stabilize in that regard. To still provide > > information about which existing interfaces a chain/flowtable currently > > binds to, new netlink attributes *_ACT_DEVS are introduced which are > > filled from the active hooks only. > > Currently, NFTA_HOOK_DEVS already represents the netdevice that are > active. If one of these devices goes aways, then it is removed from > the basechain and it does not show up in NFTA_HOOK_DEVS anymore. > > There are netlink notifications that need to fit into NLMSG_GOODSIZE, > but this adds yet another netlink array attribute. Hmm. I could introduce NFTA_HOOK_INACTIVE_DEVS which contains only those entries missing in NFTA_HOOK_DEVS. This shouldn't bloat the dumps too much (apart from the added overhead) and won't change old user space behaviour. > I think we cannot escape adding new commands such as: > > NFT_MSG_NEWDEVICE > NFT_MSG_GETDEVICE > NFT_MSG_DELDEVICE > > to populate the basechain/flowtable, those can be used only if the > "subscription" mecanism is used, so older kernels still rely in > NFTA_HOOK_DEVS (older-older kernels actually already deal with > NFTA_HOOK_DEV only...). Why can't they co-exist? I.e., NFTA_HOOK_DEV{,S} continues to behave as before and NEWDEV/DELDEV modify the list of existing chains/flowtables. An explicit GETDEV to dump currently active interfaces seems reasonable, especially with wildcard specifiers in mind. > NFT_MSG_NEWDEVICE can provide a flag to specify this is a wildcard > or exact matching. I had the same mechanism we use for wildcard ifname matching in mind which is to specify a name length which either includes or excludes the terminating NUL-char. Using strncmp() is sufficient then. Or do you think more advanced (e.g. "enp*s0") wildcards are useful? > There is also a need for a netlink attribute to specify if this is > adding a device to a chain or flowtable. Since one has to specify the specific chain or flowtable to add to anyway, one could just add NFTA_DEV_CHAIN and NFTA_DEV_FLOWTABLE attributes, which are mutually exclusive and together with NFTA_DEV_TABLE identify the object to manipulate. > With these new commands, the NFT_NETDEVICE_MAX cap can also go away in > newer kernels with a command like this: > > nft add device flowtable ip x y { a, b, c } > nft add device chain ip x y { a, b, c } > > which expands to one one NFT_MSG_* message for each item. > > Then, the nested notation will need to detect what to use depending on > the user input: > > table netdev x { > chain y { > type filter hook ingress devices = { tap* } priority 0 > } > } > > this triggers the new commands path, same if the list of devices goes > over NFT_NETDEVICE_MAX (256). This is for user space to remain compatible to older kernels, right? > This can also help incrementally report on new devices that match on a > given "subscription pattern" via new NFT_MSG_NEWDEVICE command for > monitoring purpose. Yes, adding NEWDEV/DELDEV notifications is a nice approach! > Maybe a command like: > > nft list device chain ip x y > > could be used to list the existing devices that are "active" as per > your definition, so: > > nft list ruleset > > does not show the listing of "active" devices that expand to tap*, > because this is only informational (not really required to restore a > ruleset), I guess the user only wants this to inspect to make sure > what devices are registered to the given tap* pattern. ACK. > There is also another case that would need to be handled: > > - chain A with device tap0 > - chain B with wildcard device tap* > > I would expect a "exclude" clause for the wildcard case will come > sooner or later to define a different policy for a specify chain. > The new specific command approach would be extensible in that sense. As a first implementation, I would just forbid such combinations. Assuming "tap*" is just "tap" with length 3 and "tap0" is "tap0\0" with length 5, modifying the duplicate hook check in nf_tables_parse_netdev_hooks() to perform a strncmp(namea, nameb, min(lena, lenb)) should suffice. Another option avoiding maintenance of an exclusion list would be to just add a new interface to the first chain/flowtable with matching hook in the list. This should work since 'nft list ruleset' does not sort the ruleset, so 'add chain t b; add chain t a' is a meaningful difference to 'add chain t a; add chain t b'. The exclusion list approach either means tracking a list of active hooks' matching names in each wildcard hook or some preference when searching a hook for a new interface. All this may become increasingly complicated by stuff like one hook for 'eth*' and another for 'et*'. > > This series is also prep work for a simple wildcard interface binding > > similar to the wildcard interface matching in meta expression. It should > > suffice to turn struct nft_hook::ops into an array of all matching > > interfaces, but the respective code does not exist yet. > > I think this series is all about this wildcard interface matching, > which is not coming explicitly. My motivation for this series is an open ticket complaining about the inability to define a flowtable for an interface which is created by NetworkManager (which depends on nftables service for startup). A related (but not complained so far) issue is in reverse direction: If the interface vanishes before nftables service saves the ruleset upon shutdown, the dump will be incomplete. I consider wildcard interface hooks merely a (nice) side-effect from fixing the above. Which should not require too much additional work. Thanks for your review, Phil