Re: [PATCH 4/5] network: regain guest network connectivity after firewalld switch to nftables

Laine Stump <laine@xxxxxxxxx> · Wed, 16 Jan 2019 21:43:58 -0500

On 1/15/19 12:54 PM, Daniel P. Berrangé wrote:
On Tue, Jan 15, 2019 at 12:43:05PM -0500, Laine Stump wrote:
On 1/15/19 11:39 AM, Daniel P. Berrangé wrote:
On Wed, Jan 09, 2019 at 09:57:36PM -0500, Laine Stump wrote:
From: Laine Stump<laine@xxxxxxxxxx>

In the past (when both libvirt and firewalld used iptables), if either
libvirt's rules *OR* firewalld's rules accepted a packet, it would be
accepted. This was because libvirt and firewalld rules were processed
by the same kernel hook.

But now firewalld can use nftables for its backend, while libvirt's
firewall rules are still using iptables; iptables rules are still
processed, but at a different time during packet processing
(i.e. during a different hook) than the firewalld nftables rules. The
result is that a packet must be accepted by *BOTH* the libvirt
iptables rules *AND* the firewalld nftable rules in order to be
accepted.

This causes pain because

1) libvirt always adds rules to permit DNS and DHCP (and sometimes
TFTP) from guests to the local host. But libvirt's bridges are in
firewalld's "default" zone (which is usually the zone called
"public"). The public zone allows ssh, but doesn't allow DNS, DHCP, or
TFTP. So even though libvirt's rules allow the DHCP and DNS traffic,
the firewalld rules dont, thus guests connected to libvirt's bridges
can't acquire an IP address from DHCP, nor can they make DNS queries
to the DNS server libvirt has setup on the host.

2) firewalld's higher level "rich rules" don't yet have the ability to
configure the acceptance of forwarded traffic (traffic that is going
somewhere beyond the host), so any traffic that needs to be forwarded
is rejected by the public zone's default "reject" policy (which
rejects all traffic in the zone not specifically allowed by the rules
in the zone, whether that traffic is destined to be forwarded or
locally received by the host).

libvirt can't send "direct" nftables rules (firewalld only supports
that for iptables), so we can't solve this problem by just sending
explicit nftables rules instead of explicit iptables rules (which, if
it could be done, would place libvirt's rules in the same hook as
firewalld's native rules, and thus eliminate the need for packets to
be accepted by both libvirt's and firewalld's own rules).

However, we can take advantage of a quirk in firewalld zones that have
a default policy of "accept" (meaning any packet that doesn't match a
specific rule in the zone will be *accepted*) - this default accept will
also accept forwarded traffic (not just traffic destined for the host).

Putting each network's bridge in a new zone called "libvirt" which has
a default policy of accept will allow the forwarded traffic to pass,
but the same default accept policy that fixes forwarded traffic also
causes *all* traffic from guest to host to be accepted. To solve this
new problem, we can take advantage of a new feature in firewalld
(currently slated for firewalld-0.7.0) - priorities for rich rules -
to add a low priority rule that rejects all local traffic (but leaves
alone all forwarded traffic).
Ok, so we depend on newer firewalld. Any older version using nftables
will still be broken.

I think it would be quite desirable if we queried the firewalld version
and backend and reported a fatal error to the user if they try to
start a virtual network when we know it will be broken.

You can get version from /org/fedoraproject/FirewallD1

      org.freedesktop.Properties.Get("org.fedoraproject.FirewallD1", "version")

And backend from /org/fedoraproject/FirewallD1/config

      org.freedesktop.Properties.Get("org.fedoraproject.FirewallD1.config", "FirewallBackend")
Depending on the firewalld version won't give us correct results, since the
patches to support rule priorities could be / will be backported to older
firewalld versions on some distros.

Of course we could make failure to set the zone to "libvirt" a fatal error -
that is really the most reliable proxy we have for the question "Does this
firewalld support rule priorities?" That's a bit draconian though, since it
means that "new" libvirt will fail to start networks on all distros that
haven't yet updated their firewalld version (or backported the necessary
patches). In particular, if they haven't updated firewalld, but they're
still using the iptables backend anyway, we shouldn't be whining (although
for consistency we should still set the zone to "libvirt" when possible,
even if the backend is iptables,)
I feel we should enforce failure to set libvirt zone as a fatal
error *if* we query that FirewallBackend property == nftables.
That is a situation we know will never work correctly, without
opening a security hole in the libvirt rules.

We should also enforce failure if setting zone failed and
version >= 0.7.0, regardless of what backend is set, as we
expect that to always work.

Essentially only case we don't want to report an error is
in version < 0.7.0 and backend is iptables, as that is
safe to use.

This should avoid issues with distros that choose to backport.

Okay, those all make sense.

+<rule priority='127'>
+  <reject/>
+</rule>
+<service name='dhcp'/>
+<service name='dhcpv6'/>
+<service name='dns'/>
+<service name='ssh'/>
+<service name='tftp'/>
+</zone>
I wonder if we should have an configure time check so that we do
*not* install this file if built agains an old firewalld version.
By unconditionally installing it we're probably causing firewalld
to emit an error message in syslog which some sysadmin is almost
certainly to report a bug about
1) see the comment above about the unreliability of version checks.

2) The version / build of firewalld present at build time is really
irrelevant. As a matter of fact, we currently don't require firewalld to be
present *at all* at build time (or at runtime - we just use it if it's there
and active, otherwise we go directly to iptables/ebtables).
If configure installs it by default, but provides a switch, we can
use  --with-firewall-zone / --without-firewalld-zone explicitly in
the RPM specs. We know exactly which Fedora / RHEL versions need
this in so can avoid installing it where we know if isn't wanted.

To make sure I'm understanding properly (because I think I misunderstood 
the 1st time I read it) you're suggesting that we not check the version 
of firewalld during configure/build, but have a configure option that 
always defaults to "with", and can be manually set to "without"?

Since the extraneous error message that you want to get rid of is one 
that's logged by firewalld when it fails to parse the libvirt zone file 
(I think we can avoid all other extraneous logs at runtime by checking 
the firewalld backend, and checking for existence of the libvirt zone), 
I guess this configure change would only be useful for indicating 
whether or not the libvirt zone file should be installed (and not really 
essential for any runtime code in  libvirt), but that's something that 
can't really be affected by configure (aside from having it modify the 
libvirt.spec file, but that's only useful for RHEL/Fedora, and I don't 
think RHEL or Fedora will be updating libvirt without updating 
firewalld). So I'm not sure there's a point to supporting 
--without_firewalld_zone in the configure.

3) I actually *want* the bug reported to distro maintainers, to push them to
update their firewalld sooner rather than later.
If we install it based on version where nftables supprot was first
introduced in firewalld, rather than when it was fixed, we would
still get useful bug reports without spamming people with errors on
distros where nftables isn't even supported by firewalld.

Our aim is to avoid the error log caused by firewalld failing to parse 
the libvirt zone (if that failure is due to a lack of support for rule 
priorities). To do that, we need to make sure the libvirt zone file 
isn't installed on systems where firewalld lacks support for rule 
priorities. We can either do that by modifying the package contents to 
include or not include libvirt.zone, or we can do it by modifying the 
install-time scripts to copy/not copy libvirt.zone into its place.

If we modify the package contents at build time, then updating firewalld 
to a version supporting nftables+priorities would necessitate a new 
build *and* install of the libvirt packages.

If, on the other hand, we use logic in the install scripts, we wouldn't 
have to rebuild the libvirt packages, but it would still mean that an 
update of firewalld on the target host would require a reinstall of libvirt.

In both cases, if you didn't rebuild+reinstall libvirt (or reinstall in 
the latter case) after updating firewalld to a new version with 
nftables+priorities, you would end up with the situation we have now, 
where the backend is nftables and libvirt networks are broken, but there 
is no error message pointing that out. I think that is worse than 
(temporarily, until firewalld gets updated) logging one extra error log 
each time firewalld is restarted.

Hmm, but I guess if the *only* change when building/installing with 
--without_firewalld_zones was that libvirt.zone wasn't installed (i.e. 
we would still compile the code that logs a libvirt error if the backend 
is nftables but there is no libvirt zone), then I guess the failure 
wouldn't be silent. So I retract my statement about a silent failure, 
but it still would mean the libvirt packages would need to be 
rebuilt/reinstalled to get things working after a firewalld upgrade.

So here's an idea - how about if we do this:

Once when libvirtd is started:

   retrieve a list of zones and check for presence of the libvirt zone. Save
this somewhere.

Each time a network is started:

if (libvirt zone doesn't exist) {

     if (firewalldbackend == nftables) {

        Error "Hey you!! Change the firewalld backend to nftables or update
firewalld!!"

     }

} else {

     put the bridge in the libvirt zone (regardless of nftables/iptables)

}
That doesn't solve the issue with installing an invalid zone
file for older firewalld which could generate syslog errors
on firewalld startup.

s/could/would/ :-)

But on the positive side, it's only a single error message, logged just 
once per restart of firewalld. So it's got that going for it. Which is 
nice. </BillMurrayCaddyShack>

Since it's only a transient thing (it will only be a problem for 
somebody who upgrades to new libvirt, but keeps their firewalld at 
something older than 0.7.0), I don't think we should do anything at all 
complicated to prevent it - we'd just end up with a bunch of obscure 
code that won't be used any time after 6 months from now, and will 
continue confusing people long into the future. I think it's better in 
the end to just live with a few extra (and harmless) error messages 
during the short period when some distros have libvirt >= 5.1.0 and 
firewalld < 0.7.0, rather than having silent (i.e. no error message) 
failures if a distro updates firewalld without rebuilding libvirt.

(in the meantime, we haven't relied on version number checks, and a system
with old firewalld can still build a libvirt that supports new firewalld)
As above I don't think we need to rely on version numbers for te compile
time check, as long as we provide the --with/--without flags to let the
distro vendors override it.

On one hand, it could be useful to add the 
--with/--without_firewalld_zones configure flags. On the other hand, 
that permits a distro to build a package that is guaranteed to cause a 
silent failure when firewalld is upgraded and has backend set to 
nftables, which I think is a much worse problem because it is silent. 
That makes me lean towards not even providing that option, just for the 
sake of safety.

--
libvir-list mailing list
libvir-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/libvir-list