Re: [PATCH 5/7] network: NetworkManager script to monitor/resolve conflicts with new interfaces

Laine Stump <laine@xxxxxxxxxx> · Fri, 16 Aug 2024 23:43:11 -0400

On 8/16/24 11:23 AM, Daniel P. Berrangé wrote:
On Wed, Aug 07, 2024 at 04:00:09PM -0400, Laine Stump wrote:
On 8/7/24 1:32 PM, Daniel P. Berrangé wrote:
On Wed, Aug 07, 2024 at 01:16:01PM -0400, Laine Stump wrote:
+
+import libvirt
+import sys
+import os
+import libxml2
+from ipaddress import ip_network
+
+# This script should be installed in
+# /usr/lib/NetworkManager/dispatcher.d/50-libvirt-check-nets. It will be
+# called by NetworkManager every time a network interface is taken up
+# or down. When a new network comes up, it checks the libvirt virtual
+# networks to see if their IP address(es) (including any static
+# routes) are in conflict with the IP address(es) (or static routes)
+# of the newly added interface.  If so, the libvirt network is
+# disabled. It is assumed that the user will notice that their guests
+# no longer have network connectvity (and/or the message logged by
+# this script), see that the network has been disabled, and then
+# realize the conflict when they try to restart it.
+#
+# set checkDefaultOnly=False to check *all* active libvirt networks
+# for conflicts with the new interface. Set to True to check only the
+# libvirt default network (since most networks other than the default
+# network are added post-install at a time when all of the hosts other
+# networks are already active, it may be overkill to check all of the
+# libvirt networks for conflict here (and instead just add more
+# needless overheard to bringing up a new host interface).
+#
+checkDefaultOnly = False
+
+# NB: since this file is installed in /usr/lib, it really shouldn't be
+# modified by the user, but instead should be copied to
+# /etc/NetworkManager/dispatcher.d, where it will override the copy in
+# /usr/lib. Even that isn't a proper solution though - if we're going
+# to actually have this config knob, perhaps we should check for it in
+# the environment, and if someone wants to modify it they can put a
+# short script in /etc that exports and environment variable and then
+# calls this script? Just thinking out loud here.
+
+
+def checkconflict(conn, netname, hostnets, hostif):

...snip complex code...

It occurrs to me that this python code is entirely duplicating
logic that already exists in virtnetworkd that checks for
conflicts.

Possibly not *exactly* what's in virtnetworkd, since I'm not certain if NM
includes all static routes in the list it puts in the environment variables
(while they *would* show up in the system routing table that our network
driver uses). Of course that sounds like another reason to prefer re-using
the existing code!

Oh definitely a strong reason to reuse the code. We don't want a situation
where users file bugs because we don't detect clashes in one scenario, but
do in the other.

This makes me want to figure out a way to just re-use the existing
code rather than having to write it again.

Could we just send a SIGUSR2 to virtnetworkd as a way to trigger it
to re-validate all running networks ? That would be easy to trigger
from non-network manager setups too.

That sounds simpler (and probably quicker), and if we could do it that I
would prefer it ("only works with NetworkManager" is #1 on my list of
'issues' above after all :-)), but what would we do in the case that
virtnetworkd (or libvirtd) wasn't currently running? Run some virsh command
to trigger virtnetworkd to start, and then send the SIGUSR2?

As long as there are networks running, libvirtd/virnetworkd should not
shut down due to the auto shutdown timer.

But it *does* auto-shutdown (even if there is an active guest that's 
using an active virtual network) :-). So I guess that bit was never 
wired up for virtnetworkd.

I'd never before bothered to learn how the idle shutdown timer works, 
but just spent 5 minutes cscoping back through the code, and didn't get 
to the magic call made by the qemu driver that's missing from the 
network driver. If someone can point me at the right bit so I can avoid 
more digging, that would be appreciated.

Anyway, if virtnetworkd doesn't do an idle shutdown, that simplifies 
things quite a bit (I had just assumed that we didn't want that because 
it took up extra resources and was (until now) unnecessary).

They might temporarily shutdown due to a restart on package upgrades.

If our startup code runs the "check for clashes" logic, then we'll
do the right thing.

The only problem will be if a user manually stops the daemon and
leaves it stopped, while still having a network present. I'm happy
to call that user error and not worry about solving clashes in
that case, as no normal system should get in that state.

Yeah, if you want things to work, play by the rules!

(this does
But then how about ignoring network manager entirely and going right
to the source, ie the kernel. Does it send out either udev or netlink
events (probably the latter), when NICs have their link status set
online/offline ? If so, virtnetworkd could just listen to the kernel
events and "do the right thing",  regardless of what tool is managing
the NICs.

I imagine *something* is sent out, and it's definitely worth me
investigating. But wouldn't virtnetworkd have to be running all of the time
in order to use something like that?

Right before I left town for a week away from the computer, I found the 
blog of someone who was wanting to do exactly this. I'm going to go read 
up on it now, and hopefully it won't lead me down the wrong path! :-)

(I have to say though, that needing to look at netlink to try and figure 
out a new bug report (also just before leaving town) reminded me of how 
unnecessarily complicated netlink is :-/)