On Thu, Apr 30, 2020 at 7:59 PM Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > On Thu, Apr 30, 2020 at 07:36:54PM +0200, Christian Ehrhardt wrote: > > On Thu, Apr 30, 2020 at 7:23 PM Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > > > > > On Thu, Apr 30, 2020 at 06:28:08PM +0200, Christian Ehrhardt wrote: > > > > On Thu, Apr 30, 2020 at 5:10 PM Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > > > > > > > > > On Thu, Apr 30, 2020 at 04:58:25PM +0200, Christian Ehrhardt wrote: > > > > > > On Thu, Apr 30, 2020 at 2:51 PM Daniel P. Berrangé <berrange@xxxxxxxxxx> wrote: > > > > > > Well it seems I have a reproducible symptom and a fix, but not the > > > > > > explanation why the latter fixes the former. > > > > > > I'll need to dive into some debug & analysis myself to explain it better. > > > > > > I'll be back here once I got time to do that in depth check. > > > > > > > > > > > > Until then whoever is affected (should be everyone) can give it a > > > > > > thought as well. > > > > > > Repro is as easy as > > > > > > One console: > > > > > > $ journalctl -f -u libvirt-guests > > > > > > Other console: > > > > > > $ systemctl stop libvirt-guests > > > > > > $ systemctl start libvirt-guests > > > > > > > > > > > > I see it with 245.4-4ubuntu3 (18.04) I'm not seeing it on > > > > > > 237-3ubuntu10.39 (20.04). > > > > > > Maybe it is a systemd bug after all? > > > > > > I'd be interested to hear if that is/isn't clobbering output for > > > > > > anyone else and what your systemd versions are? > > > > > > > > > > FWIW, it works correctly on Fedora 31 with systemd 243. > > > > > > > > Eoan with 242-7ubuntu3.7 is good as well. > > > > I might need to try to get some interim versions from somewhere. > > > > > > I've reproduced on Fedora 33 rawhide with systemd 245 - the first > > > place where it lists running guests is screwed up slightly: > > > > > > Apr 30 17:16:13 libvirt-fedora-rawhide systemd[1]: Stopping Suspend/Resume Running libvirt Guests... > > > Apr 30 17:16:13 libvirt-fedora-rawhide libvirt-guests.sh[69903]: Running guests on default URI: > > > Apr 30 17:16:13 libvirt-fedora-rawhide libvirt-guests.sh[69892]: Runningcore2 > > > Apr 30 17:16:13 libvirt-fedora-rawhide libvirt-guests.sh[69934]: Suspending guests on default URI... > > > Apr 30 17:16:13 libvirt-fedora-rawhide libvirt-guests.sh[69892]: SSuspending core1: ... > > > Apr 30 17:16:14 libvirt-fedora-rawhide libvirt-guests.sh[69892]: Suspending core1: done > > > Apr 30 17:16:14 libvirt-fedora-rawhide libvirt-guests.sh[69892]: Suspending core2: ... > > > Apr 30 17:16:15 libvirt-fedora-rawhide libvirt-guests.sh[69892]: Suspending core2: done > > > Apr 30 17:16:15 libvirt-fedora-rawhide systemd[1]: libvirt-guests.service: Succeeded. > > > Apr 30 17:16:15 libvirt-fedora-rawhide systemd[1]: Stopped Suspend/Resume Running libvirt Guests. > > > > > > On resume it is even worse > > > > > > Apr 30 17:19:40 libvirt-fedora-rawhide systemd[1]: Starting Suspend/Resume Running libvirt Guests... > > > Apr 30 17:19:40 libvirt-fedora-rawhide libvirt-guests.sh[70041]: Resuming guests on default URI... > > > Apr 30 17:19:40 libvirt-fedora-rawhide libvirt-guests.sh[70030]: R > > > Apr 30 17:19:40 libvirt-fedora-rawhide libvirt-guests.sh[70048]: R > > > Apr 30 17:19:41 libvirt-fedora-rawhide libvirt-guests.sh[70048]: esuming guest core1: > > > Apr 30 17:19:41 libvirt-fedora-rawhide libvirt-guests.sh[70079]: esum > > > Apr 30 17:19:41 libvirt-fedora-rawhide libvirt-guests.sh[70030]: e > > > Apr 30 17:19:41 libvirt-fedora-rawhide libvirt-guests.sh[70086]: e > > > Apr 30 17:19:42 libvirt-fedora-rawhide libvirt-guests.sh[70086]: esuming guest core2: > > > Apr 30 17:19:42 libvirt-fedora-rawhide libvirt-guests.sh[70119]: esum > > > Apr 30 17:19:42 libvirt-fedora-rawhide libvirt-guests.sh[70030]: e > > > Apr 30 17:19:42 libvirt-fedora-rawhide systemd[1]: Finished Suspend/Resume Running libvirt Guests. > > > > > > > > > It is possible it isn't systemd related - could be other packages that > > > are co-incidentally affecting it. > > > > Since launchpad holds all former builds still ready for download I > > could easily test a few versions. > > > > I up/downgraded just the following packages on an otherwise unmodified system: > > - libnss-systemd > > - libpam-systemd > > - libsystemd0 > > - systemd > > - systemd-container > > - systemd-sysv > > > > Those versions I did check: > > > > 245.4-4ubuntu3 bad > > 245.2-1ubuntu1 bad > > 244.3-1ubuntu1 bad > > 244.1-0ubuntu3 bad (bad on retry) > > 243-3ubuntu1 good (good (on retry) > > 242-7ubuntu3 good > > > > Retry means that I went and installed from good -> bad -> good and the > > behavior was the same. > > So none of the installs fixed it to then be good forever. It was > > consistently good <244 and bad >=244. > > Based on that, a possibly interesting change between v243..v244 was > this: > > commit 09d0b46ab61bebafe5bdc1be95ee153dfb13d6bc > Author: Lorenz Bauer <lmb@xxxxxxxxxxxxxx> > Date: Mon Nov 4 16:35:46 2019 +0000 > > journal: refresh cached credentials of stdout streams FYI - reported to systemd https://github.com/systemd/systemd/issues/15654 > journald assumes that getsockopt(SO_PEERCRED) correctly identifies the > process on the remote end of the socket. However, this is incorrect > according to man 7 socket: > > The returned credentials are those that were in effect at the > time of the call to connect(2) or socketpair(2). > > This becomes a problem when a new process inherits the stdout stream > from a parent. First, log messages from the child process will > be attributed to the parent. Second, the struct ucred used by journald > becomes invalid as soon as the parent exits. Further sendmsg calls then > fail with ENOENT. Logs for the child process then vanish from the journal. > > Fix this by using recvmsg on the stdout stream, and refreshing the cached > struct ucred if SCM_CREDENTIALS indicate a new process. > > > but I can't see an obvious mistake the code which would cause the > behaviour we see. > > Regards, > Daniel > -- > |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| > |: https://libvirt.org -o- https://fstop138.berrange.com :| > |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| > -- Christian Ehrhardt Staff Engineer, Ubuntu Server Canonical Ltd