Re: [libvirt RFC PATCH] network: NetworkManager script to monitor for conflicts with new interfaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/5/20 5:09 AM, Daniel P. Berrangé wrote:
On Mon, May 04, 2020 at 12:04:45PM -0400, Laine Stump wrote:
There has been a problem for several years with libvirt's default
network conflicting with the host network connection on new installs,
particularly when the "host" is actually virtual machine that is,
itself, connected to the libvirt default network on its respective
host. If the two default networks use the same subnet, and if the
nested host's libvirtd happens to start its network before the system
networking connects to the toplevel host, then network connectivity to
the guest is just silently non-working.

We've tried several things over the years to eliminate this problem,
including:

1) Checking for conflicting routes/interfaces during installation of
the libvirt-daemon-config-network package (the one containing the
default network config file) which tries different subnets until it
finds one that doesn't conflict. This helps in some cases, but fails
on 2 points: a) if the installation environment is different from the
environment where the system is actually run (e.g. a "live CD" image
of a distro, which is built in a container by the distro maintainers,
then could be run in any number of places, and b) it modifies the
installed files during the rpm installation post_install script, which
is now discouraged because people building container images don't like
dealing with that.

2) When starting a libvirt network, we now check for any route or
interface that conflicts with the new network's IP's and routes. This
doesn't fix the problem, but does notify the user of the problem *as
long as libvirt starts its networks after the host has started its
system networks*.

Neither of these help in the situation where libvirt's networks are
started first. The script in this patch uses a NetworkManager facility
(dispatcher.d scripts, which are run whenever any interface is
brought up or down) to check for any libvirt networks that conflict
with the new NetworkManager interface, and if it finds a conflict it
logs a message and destroys the libvirt network. So as with case (2)
above, it doesn't solve the problem, but it does make it more obvious,
and also makes sure the host networking isn't borked, so you can still
ssh into the machine and fix it.

There are a few caveats/misgivings with the script I've come up with,
which cause me to post it as just an RFC:

1) It's written in python, and uses libxml2

When I first sat down to make this script (following a suggestion by
danpb during an IRC discussion), I immediately put #!/bin/bash at the
top of the file, because it's supposed to be a script. But then I
remembered that we're trying to narrow down the usage of languages in
libvirt, and anyway it would be nice to be able to properly parse the
XML to get the IP addresses/netmasks/prefixes. So I instead wrote it
in python. This makes for a less ugly program, but it does mean that
the daemon-config-network package is now dependent on python3-libxml
(and python3 of course), which pulls in a bunch of other packages.
Everybody's dev systems already have these packages and their
dependencies installed (since both are required to build) and many
other users systems already have them (they are required by
virt-install and virt-manager, among others), including the Fedora
live CD, for example. But not *all* systems have them. There has been
a lot of work to reduce package bloat caused by pulling in more and
more packages, so I'm reluctant to contribute to that. On the other
hand, someone who is looking to minimize their system footprint will
also probably not be installing the libvirt default network, so maybe
it's acceptable (I'm leaning towards "not", but want to know if anyone
else has a different opinion)
I think this is a  non-issue, since you've bundled the script into
the libvirt-daemon-config-network RPM.  It would have been more of
an issue if in the libvirt-daemon-driver-network RPM as that would
have made it mandatory. Even then though, I really wouldn't be very
bothered by it.


Okay. Since you are the person who spent the most time analyzing the footprint size of libvirt in various configurations and situations, (and since I dislike the idea of writing yet another ugly hack of a bash script) I won't dispute your opinion :-)



2) As delivered, it checks for conflicts with *all* libvirt
networks.

While that was fun to write, and in theory is the right
thing to do, in practice it may be / probably is overkill. As we
discussed in IRC, the main time when this is a problem is just after
the first boot of a new OS / libvirt installation, and the only
libvirt network that's going to exist at that time is the default
network. So while having a loop that scans all libvirt networks is
more complete, in almost all cases it is just wasting time. (I did
provide a bool at the top of the script that can be modified to tell
it to only check the default network, but of course the script is
installed in /usr/lib, and it's not proper to modify files in
/usr/lib, so that isn't a real-world solution, but more a way to allow
people (i.e. "me") right now to test out the different behaviors.
I figure checking all networks is fine.


I guess if nothing else, we'll learn all about how efficient our API calls are when someone inevitably has a system with hundreds of libvirt networks and hundreds of host interfaces (NM is very thorough about this - it even calls the scripts when *libvirt* creates a bridge device (even though we never register it with NM in any way), and manages to get at least the bridge device's IP address correct in the environment vars it sends to the scripts; I haven't tried to find their code, but I guess they must be calling the dispatch.d scripts just after they detect that we've brought the bridge device online - fortunately we set all IP addresses for the bridge, and associated static routes, before we bring it online.


(One thing I just thought of - it doesn't seem quite as "tidy" if we have a script that checks all networks for conflict, but that script is only installed if a user chooses to install the package containing the default network. That almost points to putting it in libvirt-daemon-driver-network, but again that would mean *everybody* would get python3 and python3-libxml.)



3) This only works if NetworkManager is enabled.

I brought this up when danpb first mentioned the idea, and he rightly
pointed out that every report of this problem we've had has been from
the first boot of a new install on a system that used Network Manager
- anybody using any other method of networking config on their host
has had to manually intervene to get it setup, but NM tries to do
everything auto-magically. So while there may still be a very
occasional rare occurence of a silent networking failure even with
this script installed, this should eliminate 99% of the cases.
I guess there could be another tool that is not NetworkManager, but
does the same kind of job as NetworkManager which would want this
functionality.


Yeah, but in order for us to make something generally applicable, we would need to monitor for interface creation ourselves, and that would require libvirtd to always be running. Maybe when a 2nd example comes along, we can extract the host IP/prefix info part from this script and put it into a mutually-used script somewhere. (or we might decide it's simple enough that it's easier just to copy it).



So, the questions I have are:

1) Do we want to allow adding the dependency on python3-libxml to the
libvirt-daemon-config-network package in order to avoid adding a bash
script. Or should I just redo it in bash
I think this is a non-issue

2) Do we want something that checks all active networks as this does,
or just the default network (and if the latter, should we make it
easily modifiable to check all networks, or just strip it down as much
as possible)
We could have a config param in /etc/libvirt/something.conf. I'm
fine with checking all networks by default though.


Does "could" == "should"? Or should I just leave it as is? And if I do add it, should it be its own file, or glommed into some existing file? (The former increases clutter in the filesystem, the latter would increase clutter in whichever .conf file was picked, and might create unnecessary interconnections between packages. But I'm probably way over-thinking it...)



3) I could eliminate the libxml2 dependency by just grepping the xml
for "<address .*ip=". Is reduced correctness (which is probably 0%
different in practice) worth the reduced package dependencies?
We're making functional changes to the host, so its important that
we are robust in our code. As such I strongly prefer that we're
correctly parsing XML instead of regex'ing data out of it.


Okay. As long as the extra dependency isn't a problem, I 100% agree. (especially if we're checking all networks, and not just the one that comes with the stock system)



As a single question: on a sliding scale of "this script" ----->
"simple bash script" where do we want to land?
The right answer to any problem is never a shell script :-)


Now you're just making me feel bad (I've written a lot of ugly hacky bash scripts in my time)



diff --git a/libvirt.spec.in b/libvirt.spec.in
index 6abf97df85..e40068b2be 100644
--- a/libvirt.spec.in
+++ b/libvirt.spec.in
@@ -496,6 +496,7 @@ Requires: libvirt-libs = %{version}-%{release}
  Requires: dnsmasq >= 2.41
  Requires: radvd
  Requires: iptables
+Requires: python3-libxml2
%description daemon-driver-network
  The network driver plugin for the libvirtd daemon, providing
@@ -1630,6 +1631,7 @@ exit 0
  %dir %attr(0755, root, root) %{_localstatedir}/lib/libvirt/dnsmasq/
  %attr(0755, root, root) %{_libexecdir}/libvirt_leaseshelper
  %{_libdir}/%{name}/connection-driver/libvirt_driver_network.so
+%{_prefix}/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
I was sceptical about /usr/lib being right, but it seems that NM
really does want this dir.


Yeah. If a user is going to add their own script, it should go in /etc/NetworkManager/dispatcher.d, but anything installed as part of a package should go in /usr/lib/NetworkManager/dispatcher.d (and if a user wants to override it, they place a file of the same name in /etc/....)



%if %{with_firewalld_zone}
  %{_prefix}/lib/firewalld/zones/libvirt.xml
diff --git a/src/network/Makefile.inc.am b/src/network/Makefile.inc.am
index 196a30e16c..d58e09f88b 100644
--- a/src/network/Makefile.inc.am
+++ b/src/network/Makefile.inc.am
@@ -170,6 +170,9 @@ install-data-network:
  	( cd $(DESTDIR)$(confdir)/qemu/networks/autostart && \
  	  rm -f default.xml && \
  	  $(LN_S) ../default.xml default.xml )
+	$(MKDIR_P) "$(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d"
+	$(INSTALL_DATA) $(srcdir)/network/nm-dispatcher-check-nets.py \
+	  $(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
  if WITH_FIREWALLD_ZONE
  	$(MKDIR_P) "$(DESTDIR)$(prefix)/lib/firewalld/zones"
  	$(INSTALL_DATA) $(srcdir)/network/libvirt.zone \
@@ -189,7 +192,10 @@ endif WITH_FIREWALLD_ZONE
endif WITH_NETWORK -EXTRA_DIST += network/default.xml network/libvirt.zone
+EXTRA_DIST += \
+  network/default.xml \
+  network/nm-dispatcher-check-nets.py \
+  network/libvirt.zone
.PHONY: \
  	install-data-network \
diff --git a/src/network/nm-dispatcher-check-nets.py b/src/network/nm-dispatcher-check-nets.py
new file mode 100755
index 0000000000..9225a08ea1
--- /dev/null
+++ b/src/network/nm-dispatcher-check-nets.py
@@ -0,0 +1,182 @@
+#!/usr/bin/env python3
+#
+# Copyright (C) 2012-2019 Red Hat, Inc.
2020


Dang! Cut-Paste got me again!



+#
+# This library is free software; you can redistribute it and/or
+# modify it under the terms of the GNU Lesser General Public
+# License as published by the Free Software Foundation; either
+# version 2.1 of the License, or (at your option) any later version.
+#
+# This library is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+# Lesser General Public License for more details.
+#
+# You should have received a copy of the GNU Lesser General Public
+# License along with this library.  If not, see
+# <http://www.gnu.org/licenses/>.
+
+import libvirt
+import sys
+import os
+import libxml2
+from ipaddress import ip_network
+
+# This script should be installed in
+# /usr/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets. It will be
+# called by NetworkManager every time a network interface is taken up
+# or down. When a new network comes up, it checks the libvirt virtual
+# networks to see if their IP address(es) (including any static
+# routes) are in conflict with the IP address(es) (or static routes)
+# of the newly added interface.  If so, the libvirt network is
+# disabled. It is assumed that the user will notice that their guests
+# no longer have network connectvity (and/or the message logged by
+# this script), see that the network has been disabled, and then
+# realize the conflict when they try to restart it.
+#
+# set checkDefaultOnly=False to check *all* active libvirt networks
+# for conflicts with the new interface. Set to True to check only the
+# libvirt default network (since most networks other than the default
+# network are added post-install at a time when all of the hosts other
+# networks are already active, it may be overkill to check all of the
+# libvirt networks for conflict here (and instead just add more
+# needless overheard to bringing up a new host interface).
+#
+checkDefaultOnly = False
+
+# NB: since this file is installed in /usr/lib, it really shouldn't be
+# modified by the user, but instead should be copied to
+# /etc/NetworkManager/dispatcher.d, where it will override the copy in
+# /usr/lib. Even that isn't a proper solution though - if we're going
+# to actually have this config knob, perhaps we should check for it in
+# the environment, and if someone wants to modify it they can put a
+# short script in /etc that exports and environment variable and then
+# calls this script? Just thinking out loud here...
+
+def checkconflict(conn, netname, hostnets, hostif):
+
+    # ignore if the network has been brought down or removed since we
+    # got the list
+    try:
+        net = conn.networkLookupByName(netname)
+    except libvirt.libvirtError:
+        return
+
+    if not net.isActive():
+        return
+
+    xml = net.XMLDesc()
+    doc = libxml2.parseDoc(xml)
+    ctx = doc.xpathNewContext()
+
+    # see if NetworkManager is informing us that this libvirt network
+    # itself is coming online
+    bridge = ctx.xpathEval("/network/bridge/@name");
+    if bridge and bridge[0].content == hostif:
+        return
Is it worth filtering based on the @type attribute too or are
we ok relying on ?


Do you mean the <forward> element mode attribute? Or were you thinking of a domain <interface> element?


In this case, if there is a <bridge name='blah'/> in the network definition, then the network definitely uses a bridge device named "blah", so if that's the name NM is sending to the script, then it's a case of them notifying us that we're starting our own network.



+
+    # check *all* the addresses of this network
+    addrs = ctx.xpathEval("/network/*[@address]")
Shouldn't we match  /network/ip/  here ?   This wildcard also
matches /network/route/  - is that desirable ?


Yes. If there is a static <route> in a network definition that exactly matches a host network, then it will obscure the host's network too. At first I was matching them separately, but that was extra code, and I didn't know the right xpath syntax to search for "/network/(route|ip)" (that's not it), and I was tired, so I just switched to looking for any element with an address attribute. This *does* erroneously match "<mac address='blah'/>, but that ends up being weeded out because I later drop anything that doesn't also have either @prefix or @netmask. (I still think that's ugly though, and if someone is able to decipher the xpath documentation and tell me how to search for <ip> and <route> elements with the same xpath expression, I'd be really happy to hear about it!)



+    for ip in addrs:
+        ctx.setContextNode(ip)
+        address = ctx.xpathEval("@address")
+        prefix = ctx.xpathEval("@prefix")
+        netmask = ctx.xpathEval("@netmask")
+
+        if not (address and len(address[0].content)):
+            continue
+
+        addrstr = address[0].content
+        if not (prefix and len(prefix[0].content)):
+            # check for a netmask
+            if not (netmask and len(netmask[0].content)):
+                # this element has address, but no prefix or netmask
+                # probably it is <mac address ...> so we can ignore it
+                continue
+            # convert netmask to prefix
+            prefixstr = str(ip_network("0.0.0.0/%s" % netmask[0].content).prefixlen)
+        else:
+            prefixstr = prefix[0].content
+
+        virtnetaddress = ip_network("%s/%s" % (addrstr, prefixstr), strict = False)
+        # print("network %s address %s" % (netname, str(virtnetaddress)))
+        for hostnet in hostnets:
+            if virtnetaddress == hostnet:
+                # There is a conflict with this libvirt network and the specified
+                # net, so we need to disable the libvirt network
+                print("Conflict with host net %s when starting interface %s - bringing down libvirt network '%s'"
+                      % (str(hostnet), hostif, netname))
+                try:
+                    net.destroy()
+                except libvirt.libvirtError:
+                    print("Failed to destroy network %s" % netname)
+                return
+    return
+
+
+def addHostNets(hostnets, countenv, addrenv):
+
+    count = os.getenv(countenv)
+    if not count or count == 0:
+        return
+
+    for num in range(int(count)):
+        addrstr = os.getenv("%s_%d" % (addrenv, num))
+        if not addrstr or addrstr == "":
+            continue
+
+        net = ip_network(addrstr.split()[0], strict=False)
+        if net:
+            hostnets.append(net)
+    return
+
+
+############################################################
+
+if sys.argv[2] != "up":
+    sys.exit(0)
+
+hostif = sys.argv[1]
+
+try:
+    conn = libvirt.open(None)
+except libvirt.libvirtError:
+    print('Failed to open connection to the hypervisor')
+    sys.exit(0)
+
+if checkDefaultOnly:
+    nets = []
+    net = conn.networkLookupByName("default")
+    if not (net and net.isActive()):
+        sys.exit(0)
+    nets.append(net)
+else:
+    nets = conn.listAllNetworks(libvirt.VIR_CONNECT_LIST_NETWORKS_ACTIVE)
+    if not nets:
+        sys.exit(0)
+
+# We have at least one active network. Build a list of all network
+# routes added by the new interface, and compare that list to the list
+# of all networks used by each active libvirt network. If any are an
+# exact match, then we have a conflict and need to shut down the
+# libvirt network to avoid killing host networking.
+
+# When NetworkManager calls scripts in /etc/NetworkManager/dispatcher.d
+# it will have all IP addresses and routes associated with the interface
+# that is going up or down in the following environment variables:
+#
+# IP4_NUM_ADDRESSES - number of IPv4 addresses
+# IP4_ADDRESS_N - one variable for each address, starting at _0
+# IP4_NUM_ROUTES - number of IPv5 routes
+# IP4_ROUTE_N - one for each route, starting at _0
+# (replace "IP4" with "IP6" and repeat)
Oh that's very nice - much easier than probing for this info ourselves.


And they even seem to gather it when it's an interface created by someone other than NM (although I don't think they try to do anything about routes that are added outside of NM)



+#
+hostnets = []
+addHostNets(hostnets, "IP4_NUM_ADDRESSES", "IP4_ADDRESS");
+addHostNets(hostnets, "IP4_NUM_ROUTES", "IP4_ROUTE");
+addHostNets(hostnets, "IP6_NUM_ADDRESSES", "IP6_ADDRESS");
+addHostNets(hostnets, "IP6_NUM_ROUTES", "IP6_ROUTE");
+
+for net in nets:
+
+    checkconflict(conn, net.name(), hostnets, hostif)
--
2.25.4

Regards,
Daniel






[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux