On Sun, Apr 30, 2023 at 11:19:15PM -0400, Laine Stump wrote: > When I first started on this (long, protracted, repeatedly interrupted > for extended periods - many of these patches are > a year old) task, I > considered doing an all-at-once complete replacement of iptables with > nftables, since all the Linux distros we support have had nftables for > several years, and I'm pretty sure nobody has it disabled (not even > sure if it's possible to disable nftables while still enabling > iptables, since they both use xtables in the kernel). But due to > libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit > fd5b15ff all the way back in July 2010 for details) which has no > equivalent in nftables rules (and we don't *want* it to!!), and the > desire to be able to easily switch back to iptables in case of an > unforeseen regression, we decided that both iptables and nftables need > to be supported (for now), with the default (for now) remaining as > iptables. What happens if you use iptables -j CHECKSUM --checksum-fill, and this is just the iptables compat shim that is secretly talking to NFT in the kernel ? Is the checksum stuff just ignored entirely then ? If so, then the key decision factor for us, is whether any of the supported distros still officially support use of the iptables KMOD instead of nft KMOD. > So, what I've ended up with is: > > 1) a network.conf file (didn't exist before) with a single setting > "firewall_backend". If unset, the network driver tries to use iptables > on the backend, and if that's missing, then tries to use nftables. For most distros I feel like nftables should be the default and not require a user opt in. That raises the question of how do we deal with the upgrade path. Historically when starting libvirt we'll blow away our existing iptables rules and create them fresh. For an upgrade path we'll need a variant which blows away iptables rules and instead creates nftables rules. If that works, then we could default to nftables without user config. > We had spent some time in IRC discussing different ways of using new > functionality available in nftables to make a more > efficient/performant implemention of the desired filtering, and there > are some really great possibilities that need to be explored, but in > the end there were too many details up in the air, and I decided that > it would be more "accomplishable" (coined a new word there!) to first > replicate existing behavior with nftables, but do it inside a > framework that makes it easy to modify the details in the future (in > particular making it painless to switch back and forth between builds > with differing filter models at runtime) - this way we'll be able to > separate the infrastructure work from the details of the rules (which > we can then more easily work on and experiment with). (This implies > that the main objective right now is "get rid of iptables > dependencies", not "make the filtering faster and more efficient"). I feel like filtering wrt virtual networks isn't our main performance bottleneck. Most machines only have a couple of virtual networks, so most host traffic only has to match a few rules. With nwfilter though, if we have 1000 VMs we'll have 1000 top level rules that need evaluating for every packet. IOW, the performance optimizations allowed by nftables are much more relevant to nwfilter. > * allows switching between iptables/nftables backends without > rebooting or restarting networks/guests. > > Because the commands required to remove a network's filter rules are > now saved in the network status XML, each time libvirtd (or > virtnetworkd) is restarted, it will execute exactly the commands > needed to remove the filter rules that had been added by the > previous libvirtd/virtnetworkd (rather than just making a guess, as > we've always done up until now), and then add new rules using the > current backend+binary's set of rules (while also saving the info > needed for future removal of these new rules back into the network's > status XML). Ok that's answering my question about upgrades then. It looks like we should be able to default to nftables out of the box, if we assume that absence of info in the network status XML implies iptables. > * virFirewall does *not* provide a backend-agnostic interface [this is fine] > > * We need to maintain a backward-compatible API for virFirewall so > that we don't have to touch nwfilter code. Trying to make its API > backend-agnostic would require individually considering/changing > every nwfilter use of virFirewall. > > * instead virFirewall objects are just a way to build a collection > of commands to execute to build a firewall, then execute them > while collecting info for and building a collection of commands > that will tear down that firewall in the future. > > Do I want to "fix" this in the future by making virFirewall a higher > level interface that accepts tokens describing the type of rule to > add (rather than backend-specific arguments to a backend-specific > command)? No. I think I like the way virFirewall works (as > described in that previous bullet-point), instead I'm thinking that > it is just slightly mis-named - I've lately been thinking of it as a > "virNetFilterCmdList". Similarly, the virFirewallRules that it has a > list of aren't really "rules", they are better described as commands > or actions, so maybe they should be renamed to virNetfilterCmd or > virNetfilterAction. But that is just cosmetic, so I didn't want to > get into it in these patches (especially in case someone disagrees, > or has a better idea for naming). The main thing I'd like to see is to maintain the clean split of code between what is general purpose, and what is specific to the virtual network driver and what is specific to the nwfilter driver. AFAICT, this series mixes that up with circular dependancies between viriptables.c, virfirewall.c and virnetfilter.c. I'd like virfirwall.c to remain indepent of anything related to the virtual network driver. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|