This patch series enables libvirt to use nftables rules rather than iptables *when setting up virtual networks* (it does *not* add nftables support to the nwfilter driver). It accomplishes this by abstracting several iptables functions (from viriptables.[ch] called by the virtual network driver into a rudimentary "virNetfilter API" (in virnetfilter.[ch], having the virtual network driver call the virNetFilter API rather than calling the existing iptables functions directly, and then finally adding an equivalent virNftables backend that can be used instead of iptables (selected manually via a network.conf setting, or automatically if iptables isn't found on the host). A first look at the result may have you thinking that it's filled with a lot of bad decisions. While I would agree with that in many cases, I think that overall they are the "least bad" decisions, or at least "bad within acceptable limits / no worse than something else", and point out that it's been done in a way that minimizes (actually eliminates) the need for immediate changes to nwfilter (the other consumer of iptables, which *also* needs to be updated to use native nftables), and makes it much easier to change our mind about the details in the future. When I first started on this (long, protracted, repeatedly interrupted for extended periods - many of these patches are > a year old) task, I considered doing an all-at-once complete replacement of iptables with nftables, since all the Linux distros we support have had nftables for several years, and I'm pretty sure nobody has it disabled (not even sure if it's possible to disable nftables while still enabling iptables, since they both use xtables in the kernel). But due to libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit fd5b15ff all the way back in July 2010 for details) which has no equivalent in nftables rules (and we don't *want* it to!!), and the desire to be able to easily switch back to iptables in case of an unforeseen regression, we decided that both iptables and nftables need to be supported (for now), with the default (for now) remaining as iptables. Just allowing for dual backends complicated matters, since it means that we have to have a config file, a setting, detection of which backends are available, and of course some sort of concept of an abstracted frontend that can use either backend based on the config setting (and/or auto-detection). Combining that with the fact that it would just be "too big" of a project to switch over nwfilter's iptables usage at the same time means that we have to keep around a lot of existing code for compatibility's sake rather than just wiping it all away and starting over. So, what I've ended up with is: 1) a network.conf file (didn't exist before) with a single setting "firewall_backend". If unset, the network driver tries to use iptables on the backend, and if that's missing, then tries to use nftables. 2) a new (internal-only, so transient!) virNetFilterXXX API that is used by the network driver in place of the iptablesXXX API, and calls either iptablesXXX or: 3) a virNftablesXXX API that exactly replicates the filtering rules of the existing iptablesXXX API (except in the custom "libvirt" base table rather than the system "filter" and "nat" tables). This means that: 4) when the nftables backend is used, the rules added are *exactly the same* (functionally speaking) as we currently add for iptables (except they are in the "libvirt" table). We had spent some time in IRC discussing different ways of using new functionality available in nftables to make a more efficient/performant implemention of the desired filtering, and there are some really great possibilities that need to be explored, but in the end there were too many details up in the air, and I decided that it would be more "accomplishable" (coined a new word there!) to first replicate existing behavior with nftables, but do it inside a framework that makes it easy to modify the details in the future (in particular making it painless to switch back and forth between builds with differing filter models at runtime) - this way we'll be able to separate the infrastructure work from the details of the rules (which we can then more easily work on and experiment with). (This implies that the main objective right now is "get rid of iptables dependencies", not "make the filtering faster and more efficient"). Notable features of this patchset: * allows switching between iptables/nftables backends without rebooting or restarting networks/guests. Because the commands required to remove a network's filter rules are now saved in the network status XML, each time libvirtd (or virtnetworkd) is restarted, it will execute exactly the commands needed to remove the filter rules that had been added by the previous libvirtd/virtnetworkd (rather than just making a guess, as we've always done up until now), and then add new rules using the current backend+binary's set of rules (while also saving the info needed for future removal of these new rules back into the network's status XML). * firewall_backend can be explicitly set in (new) /etc/libvirt/network.conf, but if it's not explicitly set, libvirt will default to the iptables backend if the iptables binary is found, and otherwise fall back to nftables as long as the nft binary is found; otherwise the first attempt to start a network will fail with an appropriate error. Things that seem ugly / that I would like to clean up / that I think are just fine as they are: * virFirewall does *not* provide a backend-agnostic interface [this is fine] * We need to maintain a backward-compatible API for virFirewall so that we don't have to touch nwfilter code. Trying to make its API backend-agnostic would require individually considering/changing every nwfilter use of virFirewall. * instead virFirewall objects are just a way to build a collection of commands to execute to build a firewall, then execute them while collecting info for and building a collection of commands that will tear down that firewall in the future. Do I want to "fix" this in the future by making virFirewall a higher level interface that accepts tokens describing the type of rule to add (rather than backend-specific arguments to a backend-specific command)? No. I think I like the way virFirewall works (as described in that previous bullet-point), instead I'm thinking that it is just slightly mis-named - I've lately been thinking of it as a "virNetFilterCmdList". Similarly, the virFirewallRules that it has a list of aren't really "rules", they are better described as commands or actions, so maybe they should be renamed to virNetfilterCmd or virNetfilterAction. But that is just cosmetic, so I didn't want to get into it in these patches (especially in case someone disagrees, or has a better idea for naming). * Speaking of renaming - I should probably rename all the "iptablesXXX" functions to "virIptablesXXX" to be consistent with so much of our other code. I lost the ambition to deal with it right now though, so I'm leaving that for later cleanup (or I could do it now if it really makes someone's day :-). * I could have chosen a higher place in the callchain to make the virNetfilter abstraction, e.g. at the level of "networkAddXXXFirewallRules()" rather than at the lower level of iptablesXXX(). That is actually probably what will happen in the future (since it will be necessary in order for an nftables-based firewall to be significantly different in structure from an iptables-based firewall). But that's the beauty of an API being private - we can freely add/remove things as needed. the important thing is that we now have the basic structure there. For now, the split is just above the existing iptablesXXX API (util/viriptables.[ch], which seems like a "narrow" enough place. Most iptablesXXX functions are written in terms of just 10 *other* iptablesXXX functions that add iptables-specific commands - I've just moved those functions into virnetfilter.[ch] (appropriately renamed), and changed them to call the 10 virNetfilterXXX functions that will in-turn call those 10 iptablesXXX (or equivalent virNftablesXXX) functions. * Some people may dislike that the 10 virNetfilterXXX functions are each written with a switch statement that has cases to directly call each backend, rather than each backend driver having a table of pointers to API functions, with the virNetfilter API function calling backends[fwBackend]->XXX() (ie the pattern for so many drivers in libvirt). But for just 2 backends, that really seemed like overkill and unnecessary obfuscation. * As implemented here, I am storing a "<fwRemoval>" element in the network status XML - it contains a serialized virFirewall object that directly contains the commands necessary to remove the firewall. I could instead just store "<firewall>", which would include all the commands that were used to *create* the firewall in addition to the commands needed to remove the firewall. The way it's done currently takes up less space; switching to storing the full firewall *might* be more informative to somebody, but on the other hand would make the network status XML *very* long. If anybody has an opinion about this, now is the time to bring it up - do you think it's worth having a separate list of all the commands that were used to create a network's firewall (keeping in mind that there is no public API to access it)? Or is it enough to just store what's needed to remove the firewall? * Several months ago Eric Garver posted patches for a pure firewalld backend, and I requested that they not be pushed because I wanted that to be integrated with my nftables backend support. Due to the fact that the firewalld backend is almost entirely implemented by putting the bridge into a new firewalld "zone", with no individual rules added, that won't happen as just another backend driver file in parallel to iptables and nftables; it will instead work by checking firewall_backend at a higher level in the network driver, thus avoiding the calls to virNetfilterXXX() entirely. I have locally merged Eric's patches over the top of these patches, and there are surprisingly few conflicts, but since his patches didn't account for a user-settable config (but instead just always used the firewalld backend if firewalld was active), some of the patches are going to require a bit of rework, which I'll take care of after getting these patches in. Laine Stump (28): util: add -w/--concurrent when applying the rule rather than when building it util: new virFirewallRuleGet*() APIs util: determine ignoreErrors value when creating rule, not when applying util: rename iptables helpers that will become the frontend for ip&nftables util: move backend-agnostic virNetfilter*() functions to their own file util: make netfilter action a proper typedefed (virFirewall) enum util: #define the names used for private packet filter chains util: move/rename virFirewallApplyRuleDirect to virIptablesApplyFirewallRule util/network: reintroduce virFirewallBackend, but different network: add (empty) network.conf file to distribution files network: allow setting firewallBackend from network.conf network: do not add DHCP checksum mangle rule unless using iptables network: call backend agnostic function to init private filter chains util: setup functions in virnetfilter which will call appropriate backend build: add nft to the list of binaries we attempt to locate util: add nftables backend to virnetfilter API used by network driver tests: test cases for nftables backend util: new functions to support adding individual rollback rules util: check for 0 args when applying iptables rule util: implement rollback rule autosave for iptables backend util: implement rollback rule autosave for nftables backend network: turn on auto-rollback for the rules added for virtual networks util: new function virFirewallNewFromRollback() util: new functions virFirewallParseXML() and virFirewallFormat() conf: add a virFirewall object to virNetworkObj network: use previously saved list of firewall rules when removing network: save network status when firewall rules are reloaded network: improve log message when reloading virtual network firewall rules libvirt.spec.in | 5 + meson.build | 1 + po/POTFILES | 2 + src/conf/virnetworkobj.c | 40 + src/conf/virnetworkobj.h | 11 + src/libvirt_private.syms | 68 +- src/network/bridge_driver.c | 40 +- src/network/bridge_driver_conf.c | 44 + src/network/bridge_driver_conf.h | 3 + src/network/bridge_driver_linux.c | 241 +++-- src/network/bridge_driver_nop.c | 6 +- src/network/bridge_driver_platform.h | 6 +- src/network/libvirtd_network.aug | 39 + src/network/meson.build | 11 + src/network/network.conf | 24 + src/network/test_libvirtd_network.aug.in | 5 + src/nwfilter/nwfilter_ebiptables_driver.c | 16 +- src/util/meson.build | 2 + src/util/virebtables.c | 4 +- src/util/virfirewall.c | 490 ++++++++-- src/util/virfirewall.h | 51 +- src/util/viriptables.c | 762 ++++----------- src/util/viriptables.h | 222 ++--- src/util/virnetfilter.c | 892 ++++++++++++++++++ src/util/virnetfilter.h | 159 ++++ src/util/virnftables.c | 698 ++++++++++++++ src/util/virnftables.h | 118 +++ .../{base.args => base.iptables} | 0 tests/networkxml2firewalldata/base.nftables | 256 +++++ ...-linux.args => nat-default-linux.iptables} | 0 .../nat-default-linux.nftables | 248 +++++ ...pv6-linux.args => nat-ipv6-linux.iptables} | 0 .../nat-ipv6-linux.nftables | 384 ++++++++ ...rgs => nat-ipv6-masquerade-linux.iptables} | 0 .../nat-ipv6-masquerade-linux.nftables | 456 +++++++++ ...linux.args => nat-many-ips-linux.iptables} | 0 .../nat-many-ips-linux.nftables | 472 +++++++++ ...-linux.args => nat-no-dhcp-linux.iptables} | 0 .../nat-no-dhcp-linux.nftables | 384 ++++++++ ...ftp-linux.args => nat-tftp-linux.iptables} | 0 .../nat-tftp-linux.nftables | 274 ++++++ ...inux.args => route-default-linux.iptables} | 0 .../route-default-linux.nftables | 162 ++++ tests/networkxml2firewalltest.c | 56 +- tests/virfirewalltest.c | 20 +- 45 files changed, 5718 insertions(+), 954 deletions(-) create mode 100644 src/network/libvirtd_network.aug create mode 100644 src/network/network.conf create mode 100644 src/network/test_libvirtd_network.aug.in create mode 100644 src/util/virnetfilter.c create mode 100644 src/util/virnetfilter.h create mode 100644 src/util/virnftables.c create mode 100644 src/util/virnftables.h rename tests/networkxml2firewalldata/{base.args => base.iptables} (100%) create mode 100644 tests/networkxml2firewalldata/base.nftables rename tests/networkxml2firewalldata/{nat-default-linux.args => nat-default-linux.iptables} (100%) create mode 100644 tests/networkxml2firewalldata/nat-default-linux.nftables rename tests/networkxml2firewalldata/{nat-ipv6-linux.args => nat-ipv6-linux.iptables} (100%) create mode 100644 tests/networkxml2firewalldata/nat-ipv6-linux.nftables rename tests/networkxml2firewalldata/{nat-ipv6-masquerade-linux.args => nat-ipv6-masquerade-linux.iptables} (100%) create mode 100644 tests/networkxml2firewalldata/nat-ipv6-masquerade-linux.nftables rename tests/networkxml2firewalldata/{nat-many-ips-linux.args => nat-many-ips-linux.iptables} (100%) create mode 100644 tests/networkxml2firewalldata/nat-many-ips-linux.nftables rename tests/networkxml2firewalldata/{nat-no-dhcp-linux.args => nat-no-dhcp-linux.iptables} (100%) create mode 100644 tests/networkxml2firewalldata/nat-no-dhcp-linux.nftables rename tests/networkxml2firewalldata/{nat-tftp-linux.args => nat-tftp-linux.iptables} (100%) create mode 100644 tests/networkxml2firewalldata/nat-tftp-linux.nftables rename tests/networkxml2firewalldata/{route-default-linux.args => route-default-linux.iptables} (100%) create mode 100644 tests/networkxml2firewalldata/route-default-linux.nftables -- 2.39.2