On Mon, Feb 01, 2021 at 01:24:55PM +0100, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> wrote: > > A userspace daemon like firewalld might need to monitor for netlink > > updates to detect its ruleset removal by the (global) flush ruleset > > command to ensure ruleset persistence. This adds extra complexity from > > userspace and, for some little time, the firewall policy is not in > > place. > > > > This patch adds the NFT_MSG_SETOWNER netlink command which allows a > > userspace program to own the table that creates in exclusivity. > > > > Tables that are owned... > > > > - can only be updated and removed by the owner, non-owners hit EPERM if > > they try to update it or remove it. > > - are destroyed when the owner send the NFT_MSG_UNSETOWNER command, > > or the netlink socket is closed or the process is gone (implicit > > netlink socket closure). > > - are skipped by the global flush ruleset command. > > - are listed in the global ruleset. > > > > The userspace process that sends the new NFT_MSG_SETOWNER command need > > to leave open the netlink socket. > > > > The NFTA_TABLE_OWNER netlink attribute specifies the netlink port ID to > > identify the owner. > > At least for systemd use case, there would be a need to allow > add/removal of set elements from other user. Then, probably a flag for this? Such flag would work like this? - Allow for set element updates (from any process, no ownership). - nft flush ruleset skips flushing the set. - nft flush set x y flushes the content of this set. The table owner would set on such flag. Would this work for the scenario you describe below? > At the moment, table is created by systemd-networkd which will update > the masquerade set. > > In case systemd-nspawn is used and configured to expose container > services via dnat that will need to add the translation map: > > add table ip io.systemd.nat > add chain ip io.systemd.nat prerouting { type nat hook prerouting priority dstnat + 1; policy accept; } > [..] > # new generation 2 by process 1378 (systemd-network) > add element ip io.systemd.nat masq_saddr { 192.168.159.192/28 } > # new generation 3 by process 1378 (systemd-network) > add element ip io.systemd.nat map_port_ipport { tcp . 2222 : 192.168.159.201 . 22 } > # new generation 4 by process 1512 (systemd-nspawn) > > > +struct nft_owner { > > + struct list_head list; > > + possible_net_t net; > > + u32 nlpid; > > +}; > > I don't see why this is needed. > Isn't it enough to record the nlpid in the table and set a flag that the table is > owned by that pid? I'll have a look. > > + nft_active_genmask(table, genmask)) { > > + if (nlpid && table->nlpid && table->nlpid != nlpid) > > + return ERR_PTR(-EPERM); > > + > > i.e., (table->flags & OWNED) && table->nlpid != nlpid)? > > On netlink sk destruction the owner flag could be cleared or table > could be auto-zapped. Default behaviour right now is: table is released if owner is gone. It should be possible to add a flag to leave the ruleset in place (owner flag would be cleared from NETLINK_RELEASE event path).