On Thu, 2016-09-15 at 18:10 +0200, Benoit wrote: > Hi Andrea, Hi, sorry for taking so long to reply. I'm still unable to reproduce your failure, which of course makes it very difficult to debug :( But I might have a couple leads, so let's try and move forward with that. > > Since you're running an Arch variant, I'm going to assume > > all your software is basically at the latest version all > > around, isn't it? > > Yep exactly, all software / kernel are in latest version >From the log below it looks like you're running 2.1.0, which might be the latest version your distribution provides, but the same is not true when it comes to upstream development. > > Can you try adding > > > > [Unit] > > Requires=libvirtd.service > > > > to your libvirt-guests.service via 'systemctl edit' and > > see if it helps? I don't think it should be needed for your > > use case, but it's probably a good idea to have it there > > anyways. > > I have tried and as you expected it didn't changed anything, same issue. Can you please provide the output of # systemctl cat libvirt-guests.service and # systemctl show libvirt-guests.service as well? Those could contain helpful information. > I made more test and it is very reproductible, let me give you my steps > to reproduce it : > > -I shutdown -r now my host. > > -when it restarted, first thing to check is : systemctl status > libvirt-guests.service, I got : > > libvirt-guests.service - Suspend Active Libvirt Guests > Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; > enabled; vendor preset: disabled) >From the output of systemctl it looks like you didn't add the Requires=libvirtd.service relationship. Maybe you just removed it after testing? > Active: failed (Result: exit-code) since Thu 2016-09-15 17:40:07 > CEST; 1min 47s ago > Docs: man:libvirtd(8) > http://libvirt.org > Main PID: 329 (code=exited, status=1/FAILURE) > > Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed > to connect to the hypervisor > Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection > Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket > to '/var/run/libvirt/libvirt-sock': No such file or directory > Sep 15 17:40:07 libvirt-guests.sh[329]: Resuming guest : error: failed > to connect to the hypervisor > Sep 15 17:40:07 libvirt-guests.sh[329]: error: no valid connection > Sep 15 17:40:07 libvirt-guests.sh[329]: error: Failed to connect socket > to '/var/run/libvirt/libvirt-sock': No such file or directory > Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Main process exited, > code=exited, status=1/FAILURE > Sep 15 17:40:07 systemd[1]: Failed to start Suspend Active Libvirt Guests. > Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Unit entered failed > state. > Sep 15 17:40:07 systemd[1]: libvirt-guests.service: Failed with result > 'exit-code'. > > Even if the service is in FAILED mode, my guests are started correctly. Is it possible that you have configured some (all?) of your guests to be autostarted? That would explain why the guests are running after reboot even though libvirt-guests failed to connect to libvirtd. Even then, libvirt-guests is written in such a way that the fact a guest has already been started should not result in failure. I've checked, and it looks to me like the socket is created very early in the libvirtd startup process, way before attempting to start autostarted guests, so I don't see how that could cause issues. Two things stand out from the log: 1) Resuming guest : error: failed to connect to the hypervisor That progress message is supposed to contain the guest name right after "guest", but you have an empty string instead. That's consistent with the script being unable to connect to libvirtd 2) the script can't connect to libvirtd which is extremely weird because, right before looping throught the guests and starting them, a connectivity check is performed to make sure libvirtd can be reached Can you please provide a full, verbose libvirtd log[1]? Can you also add[2] set -x to /usr/libexec/libvirt-guests.sh and then provide the output of # systemctl -b 0 -u libvirt-guests.service after rebooting the host? > If I do a systemctl restart libvirt-guests.service, everything is good : > libvirt-guests.service - Suspend Active Libvirt Guests > Loaded: loaded (/usr/lib/systemd/system/libvirt-guests.service; > enabled; vendor preset: disabled) > Active: active (exited) since Thu 2016-09-15 17:42:01 CEST; 9min ago > Docs: man:libvirtd(8) > http://libvirt.org > Main PID: 605 (code=exited, status=0/SUCCESS) > > Sep 15 17:42:01 systemd[1]: Starting Suspend Active Libvirt Guests... > Sep 15 17:42:01 systemd[1]: Started Suspend Active Libvirt Guests. > > Now if I do a shutdown -r now again, I can see both in my libvirt logs + > in my guests log that an ACPI commands has been sent from libvirt and my > guests shutdown gently, so everything is ok. > > When I start again, and try to do a systemctl status > libvirt-guests.service, I got the same FAILED status as before.... > > I made a research on my logs : > > Sep 15 17:39:29 libvirt-guests.sh[1164]: Shutting down guests on default > URI... > Sep 15 17:39:29 libvirt-guests.sh[1164]: Starting shutdown on guest: ape > Sep 15 17:39:30 libvirt-guests.sh[1164]: Waiting for guest ape to shut > down, 30 seconds left Looks like you changed at least a couple of settings here, eg. the timeout. Can you please provide the output of $ grep -Ev '^(#|$)' /etc/sysconfig/libvirt-guests so we can be sure we're on the same page? > Sep 15 17:39:35 libvirt-guests.sh[1164]: Waiting for guest ape to shut > down, 25 seconds left > Sep 15 17:39:36 libvirtd[326]: internal error: End of file from monitor > Sep 15 17:39:36 libvirtd[326]: Cannot write data: Broken pipe [...] > We see the same error as systemctl status plus some error like " > internal error: End of file from monitor" and others type. > Don't know if I have anything to take a look there ? That might be a red herring, I think it's just the QEMU process being killed during shutdown and thus ending communication with libvirtd abruptly. > Finally, I also notice some very strange thing: > If I stop the service with systemctl stop libvirt-guests.service it > tooks a while to execute the stop command and fails. > > Sep 15 18:00:44 libvirt-guests.sh[978]: Starting shutdown on guest: ape > Sep 15 18:00:45 libvirt-guests.sh[978]: Waiting for guest ape to shut > down, 30 seconds left > Sep 15 18:00:50 libvirt-guests.sh[978]: Waiting for guest ape to shut > down, 25 seconds left > Sep 15 18:00:55 libvirt-guests.sh[978]: Waiting for guest ape to shut > down, 20 seconds left > Sep 15 18:01:00 libvirt-guests.sh[978]: Waiting for guest ape to shut > down, 15 seconds left > Sep 15 18:01:06 libvirt-guests.sh[978]: Waiting for guest ape to shut > down, 10 seconds left > Sep 15 18:01:11 libvirt-guests.sh[978]: Waiting for guest ape to shut > down, 5 seconds left > Sep 15 18:01:15 libvirt-guests.sh[978]: Shutdown of guest ape failed to > complete in time. > > So my guests are not able to shutdown properly but when I shutdown -r > now my host I can see that my guests have been shutdown gently, I guess > there is something related on my issue.... It looks like at least one of the guests is taking awfully long to shut down. What if you bump up the timeout? Does it manage to shut down before reaching it? [1] Follow instructions at http://wiki.libvirt.org/page/DebugLogs [2] When editing the file, add the new instruction *right after* the first line. Of course you should make a backup copy of the file before tweaking it -- Andrea Bolognani / Red Hat / Virtualization _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users