Re: [libvirt RFC PATCH] network: NetworkManager script to monitor for conflicts with new interfaces

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 04, 2020 at 12:04:45PM -0400, Laine Stump wrote:
> There has been a problem for several years with libvirt's default
> network conflicting with the host network connection on new installs,
> particularly when the "host" is actually virtual machine that is,
> itself, connected to the libvirt default network on its respective
> host. If the two default networks use the same subnet, and if the
> nested host's libvirtd happens to start its network before the system
> networking connects to the toplevel host, then network connectivity to
> the guest is just silently non-working.
> 
> We've tried several things over the years to eliminate this problem,
> including:
> 
> 1) Checking for conflicting routes/interfaces during installation of
> the libvirt-daemon-config-network package (the one containing the
> default network config file) which tries different subnets until it
> finds one that doesn't conflict. This helps in some cases, but fails
> on 2 points: a) if the installation environment is different from the
> environment where the system is actually run (e.g. a "live CD" image
> of a distro, which is built in a container by the distro maintainers,
> then could be run in any number of places, and b) it modifies the
> installed files during the rpm installation post_install script, which
> is now discouraged because people building container images don't like
> dealing with that.
> 
> 2) When starting a libvirt network, we now check for any route or
> interface that conflicts with the new network's IP's and routes. This
> doesn't fix the problem, but does notify the user of the problem *as
> long as libvirt starts its networks after the host has started its
> system networks*.
> 
> Neither of these help in the situation where libvirt's networks are
> started first. The script in this patch uses a NetworkManager facility
> (dispatcher.d scripts, which are run whenever any interface is
> brought up or down) to check for any libvirt networks that conflict
> with the new NetworkManager interface, and if it finds a conflict it
> logs a message and destroys the libvirt network. So as with case (2)
> above, it doesn't solve the problem, but it does make it more obvious,
> and also makes sure the host networking isn't borked, so you can still
> ssh into the machine and fix it.
> 
> There are a few caveats/misgivings with the script I've come up with,
> which cause me to post it as just an RFC:
> 
> 1) It's written in python, and uses libxml2
> 
> When I first sat down to make this script (following a suggestion by
> danpb during an IRC discussion), I immediately put #!/bin/bash at the
> top of the file, because it's supposed to be a script. But then I
> remembered that we're trying to narrow down the usage of languages in
> libvirt, and anyway it would be nice to be able to properly parse the
> XML to get the IP addresses/netmasks/prefixes. So I instead wrote it
> in python. This makes for a less ugly program, but it does mean that
> the daemon-config-network package is now dependent on python3-libxml
> (and python3 of course), which pulls in a bunch of other packages.

> Everybody's dev systems already have these packages and their
> dependencies installed (since both are required to build) and many
> other users systems already have them (they are required by
> virt-install and virt-manager, among others), including the Fedora
> live CD, for example. But not *all* systems have them. There has been
> a lot of work to reduce package bloat caused by pulling in more and
> more packages, so I'm reluctant to contribute to that. On the other
> hand, someone who is looking to minimize their system footprint will
> also probably not be installing the libvirt default network, so maybe
> it's acceptable (I'm leaning towards "not", but want to know if anyone
> else has a different opinion)

I think this is a  non-issue, since you've bundled the script into
the libvirt-daemon-config-network RPM.  It would have been more of
an issue if in the libvirt-daemon-driver-network RPM as that would
have made it mandatory. Even then though, I really wouldn't be very
bothered by it.

> 2) As delivered, it checks for conflicts with *all* libvirt
> networks.
> 
> While that was fun to write, and in theory is the right
> thing to do, in practice it may be / probably is overkill. As we
> discussed in IRC, the main time when this is a problem is just after
> the first boot of a new OS / libvirt installation, and the only
> libvirt network that's going to exist at that time is the default
> network. So while having a loop that scans all libvirt networks is
> more complete, in almost all cases it is just wasting time. (I did
> provide a bool at the top of the script that can be modified to tell
> it to only check the default network, but of course the script is
> installed in /usr/lib, and it's not proper to modify files in
> /usr/lib, so that isn't a real-world solution, but more a way to allow
> people (i.e. "me") right now to test out the different behaviors.

I figure checking all networks is fine.

> 3) This only works if NetworkManager is enabled.
> 
> I brought this up when danpb first mentioned the idea, and he rightly
> pointed out that every report of this problem we've had has been from
> the first boot of a new install on a system that used Network Manager
> - anybody using any other method of networking config on their host
> has had to manually intervene to get it setup, but NM tries to do
> everything auto-magically. So while there may still be a very
> occasional rare occurence of a silent networking failure even with
> this script installed, this should eliminate 99% of the cases.

I guess there could be another tool that is not NetworkManager, but
does the same kind of job as NetworkManager which would want this
functionality.

> So, the questions I have are:
> 
> 1) Do we want to allow adding the dependency on python3-libxml to the
> libvirt-daemon-config-network package in order to avoid adding a bash
> script. Or should I just redo it in bash

I think this is a non-issue

> 2) Do we want something that checks all active networks as this does,
> or just the default network (and if the latter, should we make it
> easily modifiable to check all networks, or just strip it down as much
> as possible)

We could have a config param in /etc/libvirt/something.conf. I'm
fine with checking all networks by default though.

> 3) I could eliminate the libxml2 dependency by just grepping the xml
> for "<address .*ip=". Is reduced correctness (which is probably 0%
> different in practice) worth the reduced package dependencies?

We're making functional changes to the host, so its important that
we are robust in our code. As such I strongly prefer that we're
correctly parsing XML instead of regex'ing data out of it.

> As a single question: on a sliding scale of "this script" ----->
> "simple bash script" where do we want to land?

The right answer to any problem is never a shell script :-)

> diff --git a/libvirt.spec.in b/libvirt.spec.in
> index 6abf97df85..e40068b2be 100644
> --- a/libvirt.spec.in
> +++ b/libvirt.spec.in
> @@ -496,6 +496,7 @@ Requires: libvirt-libs = %{version}-%{release}
>  Requires: dnsmasq >= 2.41
>  Requires: radvd
>  Requires: iptables
> +Requires: python3-libxml2
>  
>  %description daemon-driver-network
>  The network driver plugin for the libvirtd daemon, providing
> @@ -1630,6 +1631,7 @@ exit 0
>  %dir %attr(0755, root, root) %{_localstatedir}/lib/libvirt/dnsmasq/
>  %attr(0755, root, root) %{_libexecdir}/libvirt_leaseshelper
>  %{_libdir}/%{name}/connection-driver/libvirt_driver_network.so
> +%{_prefix}/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets

I was sceptical about /usr/lib being right, but it seems that NM
really does want this dir.

>  
>  %if %{with_firewalld_zone}
>  %{_prefix}/lib/firewalld/zones/libvirt.xml
> diff --git a/src/network/Makefile.inc.am b/src/network/Makefile.inc.am
> index 196a30e16c..d58e09f88b 100644
> --- a/src/network/Makefile.inc.am
> +++ b/src/network/Makefile.inc.am
> @@ -170,6 +170,9 @@ install-data-network:
>  	( cd $(DESTDIR)$(confdir)/qemu/networks/autostart && \
>  	  rm -f default.xml && \
>  	  $(LN_S) ../default.xml default.xml )
> +	$(MKDIR_P) "$(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d"
> +	$(INSTALL_DATA) $(srcdir)/network/nm-dispatcher-check-nets.py \
> +	  $(DESTDIR)$(prefix)/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets
>  if WITH_FIREWALLD_ZONE
>  	$(MKDIR_P) "$(DESTDIR)$(prefix)/lib/firewalld/zones"
>  	$(INSTALL_DATA) $(srcdir)/network/libvirt.zone \
> @@ -189,7 +192,10 @@ endif WITH_FIREWALLD_ZONE
>  
>  endif WITH_NETWORK
>  
> -EXTRA_DIST += network/default.xml network/libvirt.zone
> +EXTRA_DIST += \
> +  network/default.xml \
> +  network/nm-dispatcher-check-nets.py \
> +  network/libvirt.zone
>  
>  .PHONY: \
>  	install-data-network \
> diff --git a/src/network/nm-dispatcher-check-nets.py b/src/network/nm-dispatcher-check-nets.py
> new file mode 100755
> index 0000000000..9225a08ea1
> --- /dev/null
> +++ b/src/network/nm-dispatcher-check-nets.py
> @@ -0,0 +1,182 @@
> +#!/usr/bin/env python3
> +#
> +# Copyright (C) 2012-2019 Red Hat, Inc.

2020

> +#
> +# This library is free software; you can redistribute it and/or
> +# modify it under the terms of the GNU Lesser General Public
> +# License as published by the Free Software Foundation; either
> +# version 2.1 of the License, or (at your option) any later version.
> +#
> +# This library is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +# Lesser General Public License for more details.
> +#
> +# You should have received a copy of the GNU Lesser General Public
> +# License along with this library.  If not, see
> +# <http://www.gnu.org/licenses/>.
> +
> +import libvirt
> +import sys
> +import os
> +import libxml2
> +from ipaddress import ip_network
> +
> +# This script should be installed in
> +# /usr/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets. It will be
> +# called by NetworkManager every time a network interface is taken up
> +# or down. When a new network comes up, it checks the libvirt virtual
> +# networks to see if their IP address(es) (including any static
> +# routes) are in conflict with the IP address(es) (or static routes)
> +# of the newly added interface.  If so, the libvirt network is
> +# disabled. It is assumed that the user will notice that their guests
> +# no longer have network connectvity (and/or the message logged by
> +# this script), see that the network has been disabled, and then
> +# realize the conflict when they try to restart it.
> +#
> +# set checkDefaultOnly=False to check *all* active libvirt networks
> +# for conflicts with the new interface. Set to True to check only the
> +# libvirt default network (since most networks other than the default
> +# network are added post-install at a time when all of the hosts other
> +# networks are already active, it may be overkill to check all of the
> +# libvirt networks for conflict here (and instead just add more
> +# needless overheard to bringing up a new host interface).
> +#
> +checkDefaultOnly = False
> +
> +# NB: since this file is installed in /usr/lib, it really shouldn't be
> +# modified by the user, but instead should be copied to
> +# /etc/NetworkManager/dispatcher.d, where it will override the copy in
> +# /usr/lib. Even that isn't a proper solution though - if we're going
> +# to actually have this config knob, perhaps we should check for it in
> +# the environment, and if someone wants to modify it they can put a
> +# short script in /etc that exports and environment variable and then
> +# calls this script? Just thinking out loud here...
> +
> +def checkconflict(conn, netname, hostnets, hostif):
> +
> +    # ignore if the network has been brought down or removed since we
> +    # got the list
> +    try:
> +        net = conn.networkLookupByName(netname)
> +    except libvirt.libvirtError:
> +        return
> +
> +    if not net.isActive():
> +        return
> +
> +    xml = net.XMLDesc()
> +    doc = libxml2.parseDoc(xml)
> +    ctx = doc.xpathNewContext()
> +
> +    # see if NetworkManager is informing us that this libvirt network
> +    # itself is coming online
> +    bridge = ctx.xpathEval("/network/bridge/@name");
> +    if bridge and bridge[0].content == hostif:
> +        return

Is it worth filtering based on the @type attribute too or are
we ok relying on ?

> +
> +    # check *all* the addresses of this network
> +    addrs = ctx.xpathEval("/network/*[@address]")

Shouldn't we match  /network/ip/  here ?   This wildcard also
matches /network/route/  - is that desirable ?

> +    for ip in addrs:
> +        ctx.setContextNode(ip)
> +        address = ctx.xpathEval("@address")
> +        prefix = ctx.xpathEval("@prefix")
> +        netmask = ctx.xpathEval("@netmask")
> +
> +        if not (address and len(address[0].content)):
> +            continue
> +
> +        addrstr = address[0].content
> +        if not (prefix and len(prefix[0].content)):
> +            # check for a netmask
> +            if not (netmask and len(netmask[0].content)):
> +                # this element has address, but no prefix or netmask
> +                # probably it is <mac address ...> so we can ignore it
> +                continue
> +            # convert netmask to prefix
> +            prefixstr = str(ip_network("0.0.0.0/%s" % netmask[0].content).prefixlen)
> +        else:
> +            prefixstr = prefix[0].content
> +
> +        virtnetaddress = ip_network("%s/%s" % (addrstr, prefixstr), strict = False)
> +        # print("network %s address %s" % (netname, str(virtnetaddress)))
> +        for hostnet in hostnets:
> +            if virtnetaddress == hostnet:
> +                # There is a conflict with this libvirt network and the specified
> +                # net, so we need to disable the libvirt network
> +                print("Conflict with host net %s when starting interface %s - bringing down libvirt network '%s'"
> +                      % (str(hostnet), hostif, netname))
> +                try:
> +                    net.destroy()
> +                except libvirt.libvirtError:
> +                    print("Failed to destroy network %s" % netname)
> +                return
> +    return
> +
> +
> +def addHostNets(hostnets, countenv, addrenv):
> +
> +    count = os.getenv(countenv)
> +    if not count or count == 0:
> +        return
> +
> +    for num in range(int(count)):
> +        addrstr = os.getenv("%s_%d" % (addrenv, num))
> +        if not addrstr or addrstr == "":
> +            continue
> +
> +        net = ip_network(addrstr.split()[0], strict=False)
> +        if net:
> +            hostnets.append(net)
> +    return
> +
> +
> +############################################################
> +
> +if sys.argv[2] != "up":
> +    sys.exit(0)
> +
> +hostif = sys.argv[1]
> +
> +try:
> +    conn = libvirt.open(None)
> +except libvirt.libvirtError:
> +    print('Failed to open connection to the hypervisor')
> +    sys.exit(0)
> +
> +if checkDefaultOnly:
> +    nets = []
> +    net = conn.networkLookupByName("default")
> +    if not (net and net.isActive()):
> +        sys.exit(0)
> +    nets.append(net)
> +else:
> +    nets = conn.listAllNetworks(libvirt.VIR_CONNECT_LIST_NETWORKS_ACTIVE)
> +    if not nets:
> +        sys.exit(0)
> +
> +# We have at least one active network. Build a list of all network
> +# routes added by the new interface, and compare that list to the list
> +# of all networks used by each active libvirt network. If any are an
> +# exact match, then we have a conflict and need to shut down the
> +# libvirt network to avoid killing host networking.
> +
> +# When NetworkManager calls scripts in /etc/NetworkManager/dispatcher.d
> +# it will have all IP addresses and routes associated with the interface
> +# that is going up or down in the following environment variables:
> +#
> +# IP4_NUM_ADDRESSES - number of IPv4 addresses
> +# IP4_ADDRESS_N - one variable for each address, starting at _0
> +# IP4_NUM_ROUTES - number of IPv5 routes
> +# IP4_ROUTE_N - one for each route, starting at _0
> +# (replace "IP4" with "IP6" and repeat)

Oh that's very nice - much easier than probing for this info ourselves.

> +#
> +hostnets = []
> +addHostNets(hostnets, "IP4_NUM_ADDRESSES", "IP4_ADDRESS");
> +addHostNets(hostnets, "IP4_NUM_ROUTES", "IP4_ROUTE");
> +addHostNets(hostnets, "IP6_NUM_ADDRESSES", "IP6_ADDRESS");
> +addHostNets(hostnets, "IP6_NUM_ROUTES", "IP6_ROUTE");
> +
> +for net in nets:
> +
> +    checkconflict(conn, net.name(), hostnets, hostif)
> -- 
> 2.25.4
> 

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|





[Index of Archives]     [Virt Tools]     [Libvirt Users]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]     [Fedora Tools]

  Powered by Linux