Re: [libvirt PATCH 00/28] native support for nftables in virtual network driver

Laine Stump <laine@xxxxxxxxxx> · Fri, 5 May 2023 13:53:35 -0400

On 5/3/23 11:40 AM, Daniel P. Berrangé wrote:
On Sun, Apr 30, 2023 at 11:19:15PM -0400, Laine Stump wrote:
When I first started on this (long, protracted, repeatedly interrupted
for extended periods - many of these patches are > a year old) task, I
considered doing an all-at-once complete replacement of iptables with
nftables, since all the Linux distros we support have had nftables for
several years, and I'm pretty sure nobody has it disabled (not even
sure if it's possible to disable nftables while still enabling
iptables, since they both use xtables in the kernel). But due to
libvirt's use of "-t mangle -j CHECKSUM --checksum-fill" (see commit
fd5b15ff all the way back in July 2010 for details) which has no
equivalent in nftables rules (and we don't *want* it to!!), and the
desire to be able to easily switch back to iptables in case of an
unforeseen regression, we decided that both iptables and nftables need
to be supported (for now), with the default (for now) remaining as
iptables.

What happens if you use iptables -j CHECKSUM --checksum-fill, and
this is just the iptables compat shim that is secretly talking to
NFT in the kernel ?  Is the checksum stuff just ignored entirely
then ?

If so, then the key decision factor for us, is whether any of the
supported distros still officially support use of the iptables KMOD
instead of nft KMOD.

So, what I've ended up with is:

1) a network.conf file (didn't exist before) with a single setting
"firewall_backend". If unset, the network driver tries to use iptables
on the backend, and if that's missing, then tries to use nftables.

For most distros I feel like nftables should be the default
and not require a user opt in.  That raises the question of
how do we deal with the upgrade path.

Historically when starting libvirt we'll blow away our existing
iptables rules and create them fresh. For an upgrade path we'll
need a variant which blows away iptables rules and instead creates
nftables rules. If that works, then we could default to nftables
without user config.

Since my patches do that, I would be totally fine with preferring 
nftables over iptables when no config is present. It would definitely 
give a better feeling of "modernizing" if we could do that.

(Actually, I was considering that when I tweak Eric Garver's patches for 
native firewalld networks, I would prefer firewalld over nftables if 
firewalld.service is running. In that case, the order of preference 
(still overrideable, of course) would be firewalld->nftables->iptables, 
which from a functionality standpoint makes the most sense.)

We had spent some time in IRC discussing different ways of using new
functionality available in nftables to make a more
efficient/performant implemention of the desired filtering, and there
are some really great possibilities that need to be explored, but in
the end there were too many details up in the air, and I decided that
it would be more "accomplishable" (coined a new word there!) to first
replicate existing behavior with nftables, but do it inside a
framework that makes it easy to modify the details in the future (in
particular making it painless to switch back and forth between builds
with differing filter models at runtime) - this way we'll be able to
separate the infrastructure work from the details of the rules (which
we can then more easily work on and experiment with). (This implies
that the main objective right now is "get rid of iptables
dependencies", not "make the filtering faster and more efficient").

I feel like filtering wrt virtual networks isn't our main
performance bottleneck. Most machines only have a couple
of virtual networks, so most host traffic only has to
match a few rules.

With nwfilter though, if we have 1000 VMs we'll have
1000 top level rules that need evaluating for every
packet. IOW, the performance optimizations allowed
by nftables are much more relevant to nwfilter.

That's a good point, and makes me feel better about not trying to 
immediately optimize the rules created here. (although I do recall 
someone once a long time ago who filed a bug about poor performance when 
they had > 500 virtual networks :-P)

(OTOH, I've been reading through your response to 18/28, and what you're 
suggesting there, in the name of simpler removal rather than runtime 
performance, is changing what rules are generated in the initial 
version. I'm still wrapping my head around that - I had confused "chain" 
with "hook" in a strange way, and thought that the behavior was 
different than it apparently is).

* allows switching between iptables/nftables backends without
   rebooting or restarting networks/guests.

   Because the commands required to remove a network's filter rules are
   now saved in the network status XML, each time libvirtd (or
   virtnetworkd) is restarted, it will execute exactly the commands
   needed to remove the filter rules that had been added by the
   previous libvirtd/virtnetworkd (rather than just making a guess, as
   we've always done up until now), and then add new rules using the
   current backend+binary's set of rules (while also saving the info
   needed for future removal of these new rules back into the network's
   status XML).

Ok that's answering my question about upgrades then. It looks like we
should be able to default to nftables out of the box, if we assume
that absence of info in the network status XML implies iptables.

Yep, that's what the code does - if there is no <fwRemoval> element, 
then we assume that the current network instance was setup with iptables 
rules.

* virFirewall does *not* provide a backend-agnostic interface [this is fine]

   * We need to maintain a backward-compatible API for virFirewall so
     that we don't have to touch nwfilter code. Trying to make its API
     backend-agnostic would require individually considering/changing
     every nwfilter use of virFirewall.

   * instead virFirewall objects are just a way to build a collection
     of commands to execute to build a firewall, then execute them
     while collecting info for and building a collection of commands
     that will tear down that firewall in the future.

   Do I want to "fix" this in the future by making virFirewall a higher
   level interface that accepts tokens describing the type of rule to
   add (rather than backend-specific arguments to a backend-specific
   command)?  No. I think I like the way virFirewall works (as
   described in that previous bullet-point), instead I'm thinking that
   it is just slightly mis-named - I've lately been thinking of it as a
   "virNetFilterCmdList". Similarly, the virFirewallRules that it has a
   list of aren't really "rules", they are better described as commands
   or actions, so maybe they should be renamed to virNetfilterCmd or
   virNetfilterAction. But that is just cosmetic, so I didn't want to
   get into it in these patches (especially in case someone disagrees,
   or has a better idea for naming).

The main thing I'd like to see is to maintain the clean split of
code between what is general purpose, and what is specific to the
virtual network driver and what is specific to the nwfilter
driver. AFAICT, this series mixes that up with circular dependancies
between viriptables.c, virfirewall.c and virnetfilter.c.

Yeah, after reading through your comments about that part, I agree. It 
all started because viriptables.[ch] were originally put in util instead 
of network, and I was too focused on maintaining the status quo where 
possible. As for the circular dependency, that was one of the bits where 
I felt a bit dirty writing the code, as mentioned earlier. I'm going to 
try and eliminate that.

I'd like virfirwall.c to remain indepent of anything related to
the virtual network driver.

Definitely. And if viriptables is moved from util to network, then we 
*must* eliminate this bit.