Re: frequent network collapse possibly due to bridging

Laine Stump <laine@xxxxxxxxxx> · Mon, 24 Jan 2022 17:30:16 -0500

On 1/24/22 4:35 AM, Martin Kletzander wrote:
On Fri, Jan 21, 2022 at 08:42:58AM -0600, Hakan E. Duran wrote:
Hi,

I would like some help to troubleshoot the problem I have been having
lately with my VM host, which contains 5 VMs, one of which is for
pi-hole, unbound services. It has been a relatively common occurrence in
the last few weeks for me to find that the host machine has lost its
network when I get back home from work. Restoring the VM/VMs do not fix
the problem, the host needs to be restarted for a fix, otherwise there
is both loss of name resolution, as well as an internet connection; I
cannot ping even IPs such as 8.8.8.8. Since I use the pi-hole VM as 
the DNS
server for my LAN, this means that my whole LAN gets disconnected from
internet, until the host machine is rebooted. The host machine has a
little complicated network setup: the two gigabit connections are bonded
and bridged to the VMs; however this set up has been serving me so well
for several years now. The problem, on the other hand, appeared a few
weeks ago. This doesn't happen every day but often enough to be annoying
and disruptive for my family.

Always good to check what has changed those weeks ago, but I understand
it is difficult to find out what you were updating and where.

My question is, how can I troubleshoot this problem and figure out
whether it is truly due to network bridging somehow collapsing or not? I
tried to find some log files but all I could find were the
/var/log/libvirt/qemu/$VM files, and the particular log file for the 
pi-hole
VM reported the following lines; however, I am not sure if they are
associated with a real crash or just due to shutting down and restarting
the host (please excuse the word-wrapping):

char device redirected to /dev/pts/2 (label charserial0)
qxl_send_events: spice-server bug: guest stopped, ignoring
2022-01-20T23:41:17.012445Z qemu-system-x86_64: terminating on signal 
15 from pid 1 (/sbin/init)

Probably restarting the host as it got SIGTERM'd by init.  Maybe it was
restarted in a bad time and there is some inconsistency on the disk?
Using something like libvirt-guests which can manage your machines when
rebooting would be a good idea.

2022-01-20 23:41:17.716+0000: shutting down, reason=crashed
2022-01-20 23:42:46.059+0000: starting up libvirt version: 7.10.0, qemu
version: 6.2.0, kernel: 5.10.89-1-MANJARO, hostname: -redacted-

Please excuse my ignorance but is there a way to restart the
networking without rebooting the host machine? This will not solve my

You can do:

virsh net-destroy <network_name>
virsh net-start <network_name>

but depending on what the network looks like, how it is set up etc. you
might need to restart some of the VMs or manually plug them in.

The connection between any guest tap device and a host bridge device 
will be broken by virsh net-destroy, and not restored by virsh net-start 
(because the network driver has no good way of notifying the QEMU driver 
that it has restarted a network). This is something that's been on my 
"list of annoying things I should fix some day" for a very long time, 
but I've never been motivated enough to figure out a clean solution.

In the meantime, if you destroy/start a network, you can get all the 
guest tap devices reconnected by restarting libvirtd:

   systemctl restart libvirtd.service

or if you're using split daemons:

   systemctl restart virtqemud.service

One of the things the QEMU driver does when it's initializing is to 
check where each guest tap device *should* be connected, compare that to 
where it *is* connected, and if those don't match then fix it.