On Mon, May 04, 2020 at 12:04:45PM -0400, Laine Stump wrote: > There has been a problem for several years with libvirt's default > network conflicting with the host network connection on new installs, > particularly when the "host" is actually virtual machine that is, > itself, connected to the libvirt default network on its respective > host. If the two default networks use the same subnet, and if the > nested host's libvirtd happens to start its network before the system > networking connects to the toplevel host, then network connectivity to > the guest is just silently non-working. > > We've tried several things over the years to eliminate this problem, > including: > > 1) Checking for conflicting routes/interfaces during installation of > the libvirt-daemon-config-network package (the one containing the > default network config file) which tries different subnets until it > finds one that doesn't conflict. This helps in some cases, but fails > on 2 points: a) if the installation environment is different from the > environment where the system is actually run (e.g. a "live CD" image > of a distro, which is built in a container by the distro maintainers, > then could be run in any number of places, and b) it modifies the > installed files during the rpm installation post_install script, which > is now discouraged because people building container images don't like > dealing with that. > > 2) When starting a libvirt network, we now check for any route or > interface that conflicts with the new network's IP's and routes. This > doesn't fix the problem, but does notify the user of the problem *as > long as libvirt starts its networks after the host has started its > system networks*. > > Neither of these help in the situation where libvirt's networks are > started first. The script in this patch uses a NetworkManager facility > (dispatcher.d scripts, which are run whenever any interface is > brought up or down) to check for any libvirt networks that conflict > with the new NetworkManager interface, and if it finds a conflict it > logs a message and destroys the libvirt network. So as with case (2) > above, it doesn't solve the problem, but it does make it more obvious, > and also makes sure the host networking isn't borked, so you can still > ssh into the machine and fix it. > > There are a few caveats/misgivings with the script I've come up with, > which cause me to post it as just an RFC: > > 1) It's written in python, and uses libxml2 > > When I first sat down to make this script (following a suggestion by > danpb during an IRC discussion), I immediately put #!/bin/bash at the > top of the file, because it's supposed to be a script. But then I > remembered that we're trying to narrow down the usage of languages in > libvirt, and anyway it would be nice to be able to properly parse the > XML to get the IP addresses/netmasks/prefixes. So I instead wrote it > in python. This makes for a less ugly program, but it does mean that > the daemon-config-network package is now dependent on python3-libxml > (and python3 of course), which pulls in a bunch of other packages. > Everybody's dev systems already have these packages and their > dependencies installed (since both are required to build) and many > other users systems already have them (they are required by > virt-install and virt-manager, among others), including the Fedora > live CD, for example. But not *all* systems have them. There has been > a lot of work to reduce package bloat caused by pulling in more and > more packages, so I'm reluctant to contribute to that. On the other > hand, someone who is looking to minimize their system footprint will > also probably not be installing the libvirt default network, so maybe > it's acceptable (I'm leaning towards "not", but want to know if anyone > else has a different opinion) I think this is a non-issue, since you've bundled the script into the libvirt-daemon-config-network RPM. It would have been more of an issue if in the libvirt-daemon-driver-network RPM as that would have made it mandatory. Even then though, I really wouldn't be very bothered by it. > 2) As delivered, it checks for conflicts with *all* libvirt > networks. > > While that was fun to write, and in theory is the right > thing to do, in practice it may be / probably is overkill. As we > discussed in IRC, the main time when this is a problem is just after > the first boot of a new OS / libvirt installation, and the only > libvirt network that's going to exist at that time is the default > network. So while having a loop that scans all libvirt networks is > more complete, in almost all cases it is just wasting time. (I did > provide a bool at the top of the script that can be modified to tell > it to only check the default network, but of course the script is > installed in /usr/lib, and it's not proper to modify files in > /usr/lib, so that isn't a real-world solution, but more a way to allow > people (i.e. "me") right now to test out the different behaviors. I figure checking all networks is fine. > 3) This only works if NetworkManager is enabled. > > I brought this up when danpb first mentioned the idea, and he rightly > pointed out that every report of this problem we've had has been from > the first boot of a new install on a system that used Network Manager > - anybody using any other method of networking config on their host > has had to manually intervene to get it setup, but NM tries to do > everything auto-magically. So while there may still be a very > occasional rare occurence of a silent networking failure even with > this script installed, this should eliminate 99% of the cases. I guess there could be another tool that is not NetworkManager, but does the same kind of job as NetworkManager which would want this functionality. > So, the questions I have are: > > 1) Do we want to allow adding the dependency on python3-libxml to the > libvirt-daemon-config-network package in order to avoid adding a bash > script. Or should I just redo it in bash I think this is a non-issue > 2) Do we want something that checks all active networks as this does, > or just the default network (and if the latter, should we make it > easily modifiable to check all networks, or just strip it down as much > as possible) We could have a config param in /etc/libvirt/something.conf. I'm fine with checking all networks by default though. > 3) I could eliminate the libxml2 dependency by just grepping the xml > for "<address .*ip=". Is reduced correctness (which is probably 0% > different in practice) worth the reduced package dependencies? We're making functional changes to the host, so its important that we are robust in our code. As such I strongly prefer that we're correctly parsing XML instead of regex'ing data out of it. > As a single question: on a sliding scale of "this script" -----> > "simple bash script" where do we want to land? The right answer to any problem is never a shell script :-) > diff --git a/libvirt.spec.in b/libvirt.spec.in > index 6abf97df85..e40068b2be 100644 > --- a/libvirt.spec.in > +++ b/libvirt.spec.in > @@ -496,6 +496,7 @@ Requires: libvirt-libs = %{version}-%{release} > Requires: dnsmasq >= 2.41 > Requires: radvd > Requires: iptables > +Requires: python3-libxml2 > > %description daemon-driver-network > The network driver plugin for the libvirtd daemon, providing > @@ -1630,6 +1631,7 @@ exit 0 > %dir %attr(0755, root, root) %{_localstatedir}/lib/libvirt/dnsmasq/ > %attr(0755, root, root) %{_libexecdir}/libvirt_leaseshelper > %{_libdir}/%{name}/connection-driver/libvirt_driver_network.so > +%{_prefix}/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets I was sceptical about /usr/lib being right, but it seems that NM really does want this dir. > > %if %{with_firewalld_zone} > %{_prefix}/lib/firewalld/zones/libvirt.xml > diff --git a/src/network/Makefile.inc.am b/src/network/Makefile.inc.am > index 196a30e16c..d58e09f88b 100644 > --- a/src/network/Makefile.inc.am > +++ b/src/network/Makefile.inc.am > @@ -170,6 +170,9 @@ install-data-network: > ( cd $(DESTDIR)$(confdir)/qemu/networks/autostart && \ > rm -f default.xml && \ > $(LN_S) ../default.xml default.xml ) > + $(MKDIR_P) "$(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d" > + $(INSTALL_DATA) $(srcdir)/network/nm-dispatcher-check-nets.py \ > + $(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets > if WITH_FIREWALLD_ZONE > $(MKDIR_P) "$(DESTDIR)$(prefix)/lib/firewalld/zones" > $(INSTALL_DATA) $(srcdir)/network/libvirt.zone \ > @@ -189,7 +192,10 @@ endif WITH_FIREWALLD_ZONE > > endif WITH_NETWORK > > -EXTRA_DIST += network/default.xml network/libvirt.zone > +EXTRA_DIST += \ > + network/default.xml \ > + network/nm-dispatcher-check-nets.py \ > + network/libvirt.zone > > .PHONY: \ > install-data-network \ > diff --git a/src/network/nm-dispatcher-check-nets.py b/src/network/nm-dispatcher-check-nets.py > new file mode 100755 > index 0000000000..9225a08ea1 > --- /dev/null > +++ b/src/network/nm-dispatcher-check-nets.py > @@ -0,0 +1,182 @@ > +#!/usr/bin/env python3 > +# > +# Copyright (C) 2012-2019 Red Hat, Inc. 2020 > +# > +# This library is free software; you can redistribute it and/or > +# modify it under the terms of the GNU Lesser General Public > +# License as published by the Free Software Foundation; either > +# version 2.1 of the License, or (at your option) any later version. > +# > +# This library is distributed in the hope that it will be useful, > +# but WITHOUT ANY WARRANTY; without even the implied warranty of > +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +# Lesser General Public License for more details. > +# > +# You should have received a copy of the GNU Lesser General Public > +# License along with this library. If not, see > +# <http://www.gnu.org/licenses/>. > + > +import libvirt > +import sys > +import os > +import libxml2 > +from ipaddress import ip_network > + > +# This script should be installed in > +# /usr/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets. It will be > +# called by NetworkManager every time a network interface is taken up > +# or down. When a new network comes up, it checks the libvirt virtual > +# networks to see if their IP address(es) (including any static > +# routes) are in conflict with the IP address(es) (or static routes) > +# of the newly added interface. If so, the libvirt network is > +# disabled. It is assumed that the user will notice that their guests > +# no longer have network connectvity (and/or the message logged by > +# this script), see that the network has been disabled, and then > +# realize the conflict when they try to restart it. > +# > +# set checkDefaultOnly=False to check *all* active libvirt networks > +# for conflicts with the new interface. Set to True to check only the > +# libvirt default network (since most networks other than the default > +# network are added post-install at a time when all of the hosts other > +# networks are already active, it may be overkill to check all of the > +# libvirt networks for conflict here (and instead just add more > +# needless overheard to bringing up a new host interface). > +# > +checkDefaultOnly = False > + > +# NB: since this file is installed in /usr/lib, it really shouldn't be > +# modified by the user, but instead should be copied to > +# /etc/NetworkManager/dispatcher.d, where it will override the copy in > +# /usr/lib. Even that isn't a proper solution though - if we're going > +# to actually have this config knob, perhaps we should check for it in > +# the environment, and if someone wants to modify it they can put a > +# short script in /etc that exports and environment variable and then > +# calls this script? Just thinking out loud here... > + > +def checkconflict(conn, netname, hostnets, hostif): > + > + # ignore if the network has been brought down or removed since we > + # got the list > + try: > + net = conn.networkLookupByName(netname) > + except libvirt.libvirtError: > + return > + > + if not net.isActive(): > + return > + > + xml = net.XMLDesc() > + doc = libxml2.parseDoc(xml) > + ctx = doc.xpathNewContext() > + > + # see if NetworkManager is informing us that this libvirt network > + # itself is coming online > + bridge = ctx.xpathEval("/network/bridge/@name"); > + if bridge and bridge[0].content == hostif: > + return Is it worth filtering based on the @type attribute too or are we ok relying on ? > + > + # check *all* the addresses of this network > + addrs = ctx.xpathEval("/network/*[@address]") Shouldn't we match /network/ip/ here ? This wildcard also matches /network/route/ - is that desirable ? > + for ip in addrs: > + ctx.setContextNode(ip) > + address = ctx.xpathEval("@address") > + prefix = ctx.xpathEval("@prefix") > + netmask = ctx.xpathEval("@netmask") > + > + if not (address and len(address[0].content)): > + continue > + > + addrstr = address[0].content > + if not (prefix and len(prefix[0].content)): > + # check for a netmask > + if not (netmask and len(netmask[0].content)): > + # this element has address, but no prefix or netmask > + # probably it is <mac address ...> so we can ignore it > + continue > + # convert netmask to prefix > + prefixstr = str(ip_network("0.0.0.0/%s" % netmask[0].content).prefixlen) > + else: > + prefixstr = prefix[0].content > + > + virtnetaddress = ip_network("%s/%s" % (addrstr, prefixstr), strict = False) > + # print("network %s address %s" % (netname, str(virtnetaddress))) > + for hostnet in hostnets: > + if virtnetaddress == hostnet: > + # There is a conflict with this libvirt network and the specified > + # net, so we need to disable the libvirt network > + print("Conflict with host net %s when starting interface %s - bringing down libvirt network '%s'" > + % (str(hostnet), hostif, netname)) > + try: > + net.destroy() > + except libvirt.libvirtError: > + print("Failed to destroy network %s" % netname) > + return > + return > + > + > +def addHostNets(hostnets, countenv, addrenv): > + > + count = os.getenv(countenv) > + if not count or count == 0: > + return > + > + for num in range(int(count)): > + addrstr = os.getenv("%s_%d" % (addrenv, num)) > + if not addrstr or addrstr == "": > + continue > + > + net = ip_network(addrstr.split()[0], strict=False) > + if net: > + hostnets.append(net) > + return > + > + > +############################################################ > + > +if sys.argv[2] != "up": > + sys.exit(0) > + > +hostif = sys.argv[1] > + > +try: > + conn = libvirt.open(None) > +except libvirt.libvirtError: > + print('Failed to open connection to the hypervisor') > + sys.exit(0) > + > +if checkDefaultOnly: > + nets = [] > + net = conn.networkLookupByName("default") > + if not (net and net.isActive()): > + sys.exit(0) > + nets.append(net) > +else: > + nets = conn.listAllNetworks(libvirt.VIR_CONNECT_LIST_NETWORKS_ACTIVE) > + if not nets: > + sys.exit(0) > + > +# We have at least one active network. Build a list of all network > +# routes added by the new interface, and compare that list to the list > +# of all networks used by each active libvirt network. If any are an > +# exact match, then we have a conflict and need to shut down the > +# libvirt network to avoid killing host networking. > + > +# When NetworkManager calls scripts in /etc/NetworkManager/dispatcher.d > +# it will have all IP addresses and routes associated with the interface > +# that is going up or down in the following environment variables: > +# > +# IP4_NUM_ADDRESSES - number of IPv4 addresses > +# IP4_ADDRESS_N - one variable for each address, starting at _0 > +# IP4_NUM_ROUTES - number of IPv5 routes > +# IP4_ROUTE_N - one for each route, starting at _0 > +# (replace "IP4" with "IP6" and repeat) Oh that's very nice - much easier than probing for this info ourselves. > +# > +hostnets = [] > +addHostNets(hostnets, "IP4_NUM_ADDRESSES", "IP4_ADDRESS"); > +addHostNets(hostnets, "IP4_NUM_ROUTES", "IP4_ROUTE"); > +addHostNets(hostnets, "IP6_NUM_ADDRESSES", "IP6_ADDRESS"); > +addHostNets(hostnets, "IP6_NUM_ROUTES", "IP6_ROUTE"); > + > +for net in nets: > + > + checkconflict(conn, net.name(), hostnets, hostif) > -- > 2.25.4 > Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|