Re: .conf file setting(s) for packet filtering backend(s)

Laine Stump <laine@xxxxxxxxxx> · Thu, 6 Jan 2022 14:24:18 -0500

On 1/4/22 7:57 AM, Daniel P. Berrangé wrote:
On Sun, Jan 02, 2022 at 09:41:37PM -0500, Laine Stump wrote:
I'm currently working on switching the backend of the network driver from
using iptables to using nftables. Due to some functionality that is not
available with nftables (the rule that fixes up the checksum of DHCP packets
which, btw, is only relevant for *very* old guests, e.g. RHEL5), this needs
to be opt-in via a config file setting. In the meantime, in order to make
this doable in a reasonable amount of time, I am *not* converting the
nwfilter driver right away, and when I do it will need its own config file
setting for opt-in.

I've never before looked at the code for the .conf file settings at all. I
had assumed there would be some sort of "pull" API, where code in the
drivers could call, e.g. virConfGetString("filter_backend") and it would
return the config setting to the caller. But when I look at it, I see that
all daemons use the same daemonConfigLoadFile() called from
remote_daemon.c:main() (which is shared by all the daemons) and the
daemonConfig object that is created to hold the config settings that are
read is only visible within main() - the only way that a config setting is
used is by main() "pushing" it out to a static variable somewhere else where
it is later retrieved by the interested party, e.g. the way that main()
calls daemonSetupNetDevOpenvswitch(config), which then sets the static
virNetDevOpenvswitchTimeout in util/virnetdevopenvswitch.c.

I'd consider the OVS approach to be a bad example

Global state needing configurable parameters for stuff in util/ should
generally be considered a design flaw.

Okay, so then the setting of the host uuid is also a bad example (set 
into a static in util/viruuid.c), as is all the config for logging (set 
in statics in util/virlog.c by calling a function in util/virdaemon.c) :-)

(Of course, those are things that *are* conceivably needed by all 
daemons, not just a subset of them, so I guess it's different, but still 
if you want to be pedantic, they do fit your description)

 Global state should be exclusively
in the drivers,

It just occurred to me that two different things are being discussed in 
parallel here without really thinking about it (at least by me) - daemon 
config, and driver config. In the past there was a single daemon, with 
its config in libvirtd.conf, and many drivers, each potentially having 
its own separate .conf file (but in practice there is only qemu.conf, 
lxc.conf, and libxl.conf - no .conf files for other drivers that I can 
see). Now with split daemons, each driver has its own daemon, and each 
daemon has its own .conf file (virtqemud.conf, virtlxc.conf, 
virtnetworkd.conf, ...) while drivers continue to have (or not have) 
their own separate conf file (qemu.conf, lxc.conf, libxl.conf). When the 
daemon split happened, I made the (incorrect) leap that the new 
virt*d.conf files were a convenient place for driver-specific config.

Instead, I guess the virt*d.conf files should only be used for config 
related to operation of the daemon process, how it's connected to, 
logging of its error messages, etc., but shouldn't have any config 
specific to the driver that happens to be running in that daemon; for 
driver-specific things there should be a *.conf file (qemu.conf, and now 
it seems I will need network.conf) which is read by the driver itself.

(not sure what should be done with ovs_timeout, which is, as you point 
out, in the wrong place)

That seems like a lot of files, but I guess as long as it's got a (well 
documented) logic to it, it shouldn't be any worse than having fewer 
files each of a greater length :-)

Anyway, rather than looking at what happens when virtnetworkd.conf is 
read and adding a new knob there, I really should be looking at 
qemu_conf.c and using that as an example to add parsing of a new 
network.conf (which doesn't currently exist) to the network driver (and 
later nwfilter.conf to the nwfilter driver)

and then the desired values passed into the util APIs
explicitly.

Well, that *is* being done with ovs_timeout. It's just that the API 
being called is setting a static in the util/virnetdevopenvswitch.c 
(just as is done with logging config and host uuid), and then later used 
implicitly by other functions in the same file, rather than sending it 
as an arg to each API call that needs it. My guess is that since the 
setting is used for both qemu and lxc, the author figured putting a 
single instance of the config in libvirtd.conf would "make life simpler".

ie  ovs_timeout should have been in qemu.conf (any other drivers' config
files if appropriate).

Somehow I had always considered qemu.conf to be specifically for things 
related to starting the qemu process *only* (and not necessarily 
pertaining to the entire qemu driver), although even with that 
interpretation I guess ovs_timeout could be considered to be in that 
group as well (since it's used when running ovs-vsctl as part of 
preparing the network connection for a qemu process that will soon be 
started). I see now I've just been too narrow minded all this time.

(NB: util/virnetdevopenvswitch.c is linked into every deamon, so even for
the daemons that don't use it, calls to virnetdevopenvswitch.c functions
still compile properly (and calling them is harmless), so
virNetDevOpenvswitchTimeout is set even for daemons that never call
openvswitch APIs).

This is another bit of technical debt. We've been lazy with putting
stuff into util that really ought not to be there.

I don't think it's a problem putting virnetdevopenvswitch.c into util, 
since it is used already by two different drivers (qemu and lxc) and 
could be used by others in the future. The problem is that a function 
from util/virnetdevopenvswitch.c gets called by virdeamon.c, even when 
nothing in the driver being run by the daemon ever *uses* the rest of 
the openvswitch API. util is a good place to put functions that are used 
by more than one driver (or conf); any file (function? I've truthfully 
never paid attention to that detail) that isn't referenced from a 
particular daemon binary just shouldn't be linked into it.

This stuff even gets into the libvirt.so that's used client side,
so the argument that we had a single monolithic libvirtd didn't
apply either.

Really? I have always just assumed that if nothing in a particular .o 
was referenced, then that .o wouldn't show up in the binary. And even if 
that isn't the case, then we could tailor the build to only include 
those sources that are actually used (although that would be cumbersome 
to maintain).

If I could count on all builds using split daemons (i.e. separate
virtnetworkd and virtnwfilterd) then I could add a similar API in
virfirewall.c that remote_daemon.c:main() could use to set "filter_backend"
into a static in virfirewall.c (which is used by both drivers) and
everything would just happily work:

    virtnetworkd.conf:
       filter_backend = nftables

    virtnwfilterd.conf
       filter_backend = iptables

Putting these settings into virtnetworkd.conf and virtnwfilterd.conf
certainly makes conceptual sense.

Or maybe, based on what I say about "virtqemud.conf vs. qemu.conf" (and 
thus "virtnetworkd.conf vs. network.conf") above, they should be put in 
networkd.conf and nwfilter.conf. (again, I'm loathe to create "yet 
another" config file, but that seems the most logical thing to do).

The problem you mention is avoided by not having global state in
virtfirewall.c. Just pass the setting into every API call whuere it
is relevant.

Yes. That has been my plan - virFirewallNew() will have a single arg 
"backend" which is set to either VIR_NETFILTER_BACKEND_NFTABLES or 
VIR_NETFILTER_BACKEND_IPTABLES. That would be set in each virFirewall 
object when its created and used as the rules are added, and later when 
they are executed.

However, I need to also deal with the possibility that the nwfilter and
network drivers are in the same unified libvirtd binary, and in that case
both drivers would have the same virfirewall.c:filter_backend setting, thus
making it impossible to use the iptables backend for the nwfilter driver and
nftables backend for the network driver. For that case I would need separate
settings in the config for each driver, e.g.

    libvirtd.conf:
       network_filter_backend = nftables
       nwfilter_backend = iptables

Definitely don't want this, as its just follwing thue mistake we did
with ovs.

Yes, I really want to avoid it. And after my Aha moment about 
virtnetworkd.conf vs. network.conf I think I can.

So should I perhaps declare the nftables backend for nwfilter to be a lost
cause until everyone moves to split daemons, add a "filter_backend" setting
that is directly set in virfirewall.c (by remote_daemon.c:main()), and then
provide some sort of override in virFirewallNew so calls from the nwfilter
driver can say "ignore the filter_backend setting and use iptables"?

I'm wondering how you're integrating nftables into virfirewall in
general ?

Currently we just have

     VIR_FIREWALL_LAYER_ETHERNET,
     VIR_FIREWALL_LAYER_IPV4,
     VIR_FIREWALL_LAYER_IPV6,

which get mapped to ebtables, iptables and ip6tables internally.
Previously they could also get mapped to firewalld but we removed
that. This worked because both firewalld passthrough and the
native commands took the same syntax, so the choice of backends
was transparent to the caller.

Now with use of nftables, we have completely different syntax
for adding rules. IOW, the caller needs to decide which backend
to use, in order to decide what syntax to use with
virFirewallAddRule.

IIUC, with nftables there is no split between ethernet, ipv4
and ipv6 filtering. This makes the VIR_FIREWALL_LAYER_*
enum somewhat redundant/inappropriate as a high level
conceptual thing.

Since the arguments to virFirewallAddRule are inherantly
tied to the specific firewall command, we shoudl probably
just admit this in the API. IOW, rename

typedef enum {
     VIR_FIREWALL_LAYER_ETHERNET,
     VIR_FIREWALL_LAYER_IPV4,
     VIR_FIREWALL_LAYER_IPV6,

     VIR_FIREWALL_LAYER_LAST,
} virFirewallLayer;

to

typedef enum {
     VIR_FIREWALL_TOOL_EBTABLES,
     VIR_FIREWALL_TOOL_IPTABLES,
     VIR_FIREWALL_TOOL_IP6TABLES,

     VIR_FIREWALL_TOOL_LAST,
} virFirewallTool;

at which point we can just add

   VIR_FIREWALL_TOOL_NFTABLES

Now we don't need any global config in firewall.c to select between
nftables and traditional xtables commands - it is always explicitly
given by the caller

But the "layer" still needs to be conveyed in some way because it 
conveys important information about the rule - the same rule could be 
used for IPv6 or IPv4, and the way we give that information now is via 
"layer" in the args when adding a rule. If we changed the meaning of 
layer, then we would just need to add a different arg in its place for 
the old meaning, so there's really no gain.

The way I'm dealing with this in the first pass is to replace all the 
"iptablesBlahRules" calls with "virNetfilterBlahRules", which are 
backend-agnostic functions that use the backend setting in the 
virFirewall object they're operating on (set in virFirewallNew) to call 
either the old iptablesBlahRules() or a new virNftablesBlahRules(). Then 
when virFirewallApply() is called, the backend is checked to determine 
which command to add to the list of args (in the case of iptables it 
will be either iptables/ip6tables/ebtables depending on the layer, and 
in the case of nftables it will always be nft, but a family will need to 
be added (or maybe substituted) into the middle of the args depending on 
whether its ipv4 or ipv6 - this is slightly ugly but minimizes changes 
required to the nwfilter code (and will be eliminated as part of 
supporting nftables in nwfilter).

Eventually, 1) virFirewallApply() should create a file containing all 
the rules and call nft once with that file rather than calling it over 
and over with each individual rule, 2) it should use the dbus API for 
nftables rather than the nft command, and 3) the virFirewallRule object 
should become more intelligent - rather than being just a list of 
backend-specific args, it can be an abstract representation of a rule, 
so that all the "add rules" functions can be unified (no separate 
functions for iptables vs nftables), and then the final 
virFirewallApply() would create the backend-specific set of args for 
each rule.

(1) (and/or possibly (2) should happen right away, while (3) can only be 
done as a part of converting nwfilter to support the nftables backend.