Re: [PATCH v2 0/7] Dynamic hook interface binding

Phil Sutter <phil@xxxxxx> · Wed, 19 Jun 2024 17:59:32 +0200

On Wed, Jun 19, 2024 at 04:48:35PM +0200, Pablo Neira Ayuso wrote:
> On Wed, Jun 19, 2024 at 01:27:06PM +0200, Phil Sutter wrote:
> > On Tue, Jun 18, 2024 at 01:10:21AM +0200, Pablo Neira Ayuso wrote:
> > > Hi Phil,
> > > 
> > > On Fri, May 17, 2024 at 03:06:08PM +0200, Phil Sutter wrote:
> > > > Changes since v1:
> > > > - New patch 6 adding notifications for updated hooks.
> > > > - New patch 7 adding the requested torture test.
> > > > 
> > > > Currently, netdev-family chains and flowtables expect their interfaces
> > > > to exist at creation time. In practice, this bites users of virtual
> > > > interfaces if these happen to be created after the nftables service
> > > > starts up and loads the stored ruleset.
> > > > 
> > > > Vice-versa, if an interface disappears at run-time (via module unloading
> > > > or 'ip link del'), it also disappears from the ruleset, along with the
> > > > chain and its rules which binds to it. This is at least problematic for
> > > > setups which store the running ruleset during system shutdown.
> > > 
> > > I'd suggest that you place your patch 2/7 to modify the existing
> > > behaviour in first place.
> > 
> > I did not to avoid "dead" entries in the hook list (with ops.dev gone,
> > there's nothing left identifying the hook). But it's only temporary and
> > maybe cleans up the diffs, will give it a go.
> 
> OK.
> 
> > > > This series attempts to solve these problems by effectively making
> > > > netdev hooks name-based: If no matching interface is found at hook
> > > > creation time, it will be inactive until a matching interface appears.
> > > > If a bound interface is renamed, a matching inactive hook is searched
> > > > for it.
> > > > 
> > > > Ruleset dumps will stabilize in that regard. To still provide
> > > > information about which existing interfaces a chain/flowtable currently
> > > > binds to, new netlink attributes *_ACT_DEVS are introduced which are
> > > > filled from the active hooks only.
> > > 
> > > Currently, NFTA_HOOK_DEVS already represents the netdevice that are
> > > active. If one of these devices goes aways, then it is removed from
> > > the basechain and it does not show up in NFTA_HOOK_DEVS anymore.
> > > 
> > > There are netlink notifications that need to fit into NLMSG_GOODSIZE,
> > > but this adds yet another netlink array attribute.
> > 
> > Hmm. I could introduce NFTA_HOOK_INACTIVE_DEVS which contains only those
> > entries missing in NFTA_HOOK_DEVS. This shouldn't bloat the dumps too
> > much (apart from the added overhead) and won't change old user space
> > behaviour.
> 
> Not sure. What does NFTA_HOOK_INACTIVE_DEVS contains? Could you
> provide an example? Again, if this array gets again too large, there
> could be issues with NLMSG_GOODSIZE again for notifications.

I assumed your intention was for NFTA_HOOK_DEVS to not change
semantically, i.e. remain to contain only devices which are present at
time of the dump. Then I could introduce INACTIVE_DEVS to contain those
we lost meanwhile. As an example:

1) add netdev chain for devices eth0, eth1, eth2
2) list ruleset:
   - HOOK_DEVS = { eth0, eth1, eth2 }
   - INACTIVE_DEVS = {}
3) ip link del eth1
4) list ruleset:
   - HOOK_DEVS = { eth0, eth2 }
   - INACTIVE_DEVS = { eth1 }

This avoids duplicate entries in both lists and thus avoids overhead.
This would fix for the interfaces missing in dumps problem.

Wildcards would appear as-is in either HOOK_DEVS (if there's at least
one matching interface) or INACTIVE_DEVS (if there is none). The actual
list of active interfaces would require a GETDEVICE call.

> I would just display these active devices (in the list of devices that
> are attached to this basechain) via the new command _GETDEV that we
> are discussing below? These netdevices that match the pattern come and
> go, I guess user only wants to make sure they are actually registered
> to this hook for diagnostics, showing an exact match, ie. tap0, or
> inexact match, ie. tap* should be should when listing the ruleset IMO.

OK, let's see if I can sum this up correctly:

1) NFTA_HOOK_DEVS is changed to always reflect what the user specified
2) Interfaces being removed or added trigger NEWDEV/DELDEV notifications
3) Active hooks are dumped by GETDEV netlink request
4) NEWDEV/DELDEV netlink requests/responses added to cover for oversized
chains/flowtables

You're saying we have to use (4) for wildcard interfaces, too. Is this
to keep them away from NFTA_HOOK_DEVS? Because in theory 1-3 are
sufficient for wildcards, too.

> > > I think we cannot escape adding new commands such as:
> > > 
> > > NFT_MSG_NEWDEVICE
> > > NFT_MSG_GETDEVICE
> > > NFT_MSG_DELDEVICE
> > > 
> > > to populate the basechain/flowtable, those can be used only if the
> > > "subscription" mecanism is used, so older kernels still rely in
> > > NFTA_HOOK_DEVS (older-older kernels actually already deal with
> > > NFTA_HOOK_DEV only...).
> > 
> > Why can't they co-exist? I.e., NFTA_HOOK_DEV{,S} continues to behave as
> > before and NEWDEV/DELDEV modify the list of existing chains/flowtables.
> > An explicit GETDEV to dump currently active interfaces seems reasonable,
> > especially with wildcard specifiers in mind.
> 
> yes, NFTA_HOOK_DEVS and these new commands need to coexist, there is
> no other way to retain backward compatibility.
> 
> > > NFT_MSG_NEWDEVICE can provide a flag to specify this is a wildcard
> > > or exact matching.
> > 
> > I had the same mechanism we use for wildcard ifname matching in mind
> > which is to specify a name length which either includes or excludes the
> > terminating NUL-char. Using strncmp() is sufficient then.
> 
> That's fine.
> 
> > Or do you think more advanced (e.g. "enp*s0") wildcards are useful?
> 
> I prefer future does not bring "enp*s0", but let's agree on an API
> that can be extended to accomodate new requirements, that's my only
> comment in this regard.
> 
> > > There is also a need for a netlink attribute to specify if this is
> > > adding a device to a chain or flowtable.
> > 
> > Since one has to specify the specific chain or flowtable to add to
> > anyway, one could just add NFTA_DEV_CHAIN and NFTA_DEV_FLOWTABLE
> > attributes, which are mutually exclusive and together with
> > NFTA_DEV_TABLE identify the object to manipulate.
> 
> Right.
> 
> > > With these new commands, the NFT_NETDEVICE_MAX cap can also go away in
> > > newer kernels with a command like this:
> > > 
> > >         nft add device flowtable ip x y { a, b, c }
> > >         nft add device chain ip x y { a, b, c }
> > > 
> > > which expands to one one NFT_MSG_* message for each item.
> > > 
> > > Then, the nested notation will need to detect what to use depending on
> > > the user input:
> > > 
> > >         table netdev x {
> > >                 chain y {
> > >                         type filter hook ingress devices = { tap* } priority 0
> > >                 }
> > >         }
> > > 
> > > this triggers the new commands path, same if the list of devices goes
> > > over NFT_NETDEVICE_MAX (256).
> > 
> > This is for user space to remain compatible to older kernels, right?
> 
> Yes.
> 
> > > This can also help incrementally report on new devices that match on a
> > > given "subscription pattern" via new NFT_MSG_NEWDEVICE command for
> > > monitoring purpose.
> > 
> > Yes, adding NEWDEV/DELDEV notifications is a nice approach!
> 
> This will also help to remove the existing NFT_NETDEVICE_MAX limit.
> 
> > > Maybe a command like:
> > > 
> > >         nft list device chain ip x y
> > > 
> > > could be used to list the existing devices that are "active" as per
> > > your definition, so:
> > > 
> > >         nft list ruleset
> > > 
> > > does not show the listing of "active" devices that expand to tap*,
> > > because this is only informational (not really required to restore a
> > > ruleset), I guess the user only wants this to inspect to make sure
> > > what devices are registered to the given tap* pattern.
> > 
> > ACK.
> > 
> > > There is also another case that would need to be handled:
> > > 
> > >         - chain A with device tap0
> > >         - chain B with wildcard device tap*
> > > 
> > > I would expect a "exclude" clause for the wildcard case will come
> > > sooner or later to define a different policy for a specify chain.
> > > The new specific command approach would be extensible in that sense.
> > 
> > As a first implementation, I would just forbid such combinations.
> > Assuming "tap*" is just "tap" with length 3 and "tap0" is "tap0\0" with
> > length 5, modifying the duplicate hook check in
> > nf_tables_parse_netdev_hooks() to perform a strncmp(namea, nameb,
> > min(lena, lenb)) should suffice.
> 
> That is fine to ensure a given basechain does not have both tap0 and tap*

Ah, I missed that nf_tables_parse_netdev_hooks() merely searches the
current list for duplicate entries, not all chains'/flowtables' ones.

Hmm. I can create multiple netdev chains attaching to the same
interfaces, but only a single flowtable. Is this intentional?

> > Another option avoiding maintenance of an exclusion list would be to
> > just add a new interface to the first chain/flowtable with matching hook
> > in the list. This should work since 'nft list ruleset' does not sort the
> > ruleset, so 'add chain t b; add chain t a' is a meaningful difference to
> > 'add chain t a; add chain t b'.
> > 
> > The exclusion list approach either means tracking a list of active
> > hooks' matching names in each wildcard hook or some preference when
> > searching a hook for a new interface. All this may become increasingly
> > complicated by stuff like one hook for 'eth*' and another for 'et*'.
> > 
> > > > This series is also prep work for a simple wildcard interface binding
> > > > similar to the wildcard interface matching in meta expression. It should
> > > > suffice to turn struct nft_hook::ops into an array of all matching
> > > > interfaces, but the respective code does not exist yet.
> > > 
> > > I think this series is all about this wildcard interface matching,
> > > which is not coming explicitly.
> > 
> > My motivation for this series is an open ticket complaining about the
> > inability to define a flowtable for an interface which is created by
> > NetworkManager (which depends on nftables service for startup).
> > 
> > A related (but not complained so far) issue is in reverse direction: If
> > the interface vanishes before nftables service saves the ruleset upon
> > shutdown, the dump will be incomplete.
> >
> > I consider wildcard interface hooks merely a (nice) side-effect from
> > fixing the above. Which should not require too much additional work.
> 
> OK.
> 
> Thanks.
>