On 2014/10/9 19:08, Martin Kletzander wrote: > On Thu, Oct 09, 2014 at 11:52:49AM +0100, Daniel P. Berrange wrote: >> On Thu, Oct 09, 2014 at 12:37:42PM +0200, Jiri Denemark wrote: >>> On Thu, Oct 09, 2014 at 11:49:34 +0200, Martin Kletzander wrote: >>> > On Thu, Oct 09, 2014 at 10:14:48AM +0100, Daniel P. Berrange wrote: >>> > >On Thu, Oct 09, 2014 at 09:58:30AM +0200, Martin Kletzander wrote: >>> > >> When daemon is killed right in the middle of probing a qemu binary for >>> > >> its capabilities, the VM is left running. Next time the daemon is >>> > >> starting, it cannot start qemu process because the one that's already >>> > >> running does have the pidfile flock()'d. >>> > > >>> > >I was wondering if there's anything we can easily change in the way >>> > >we launch the QEMU binary so that it automatically dies when libvirtd >>> > >exits, rather than us needing to manually kill it. >>> > > >>> > >The comments say we have to use daemonize to synchronize with the >>> > >monitor socket creation and I recall we've tried other approaches >>> > >to that before which failed. >>> > > >>> > >Another idea would be to play with adding '-serial stdio' and then >>> > >when libvirt died stdio would get a broken pipe but I don't think >>> > >it is safe to use -serial when we have -M none so that's out. >>> > > >>> > >So I guss we don't have much choice but to manually kill. >>> > > >>> > >> Reported-by: Wang Yufei <james.wangyufei@xxxxxxxxxx> >>> > >> >>> > >> Signed-off-by: Martin Kletzander <mkletzan@xxxxxxxxxx> >>> > >> --- >>> > >> src/qemu/qemu_capabilities.c | 41 +++++++++++++++++++++++++++++++++++++++++ >>> > >> 1 file changed, 41 insertions(+) >>> > >> >>> > >> diff --git a/src/qemu/qemu_capabilities.c b/src/qemu/qemu_capabilities.c >>> > >> index 6fcb5c7..919780e 100644 >>> > >> --- a/src/qemu/qemu_capabilities.c >>> > >> +++ b/src/qemu/qemu_capabilities.c >>> > >> @@ -3243,6 +3243,47 @@ virQEMUCapsInitQMP(virQEMUCapsPtr qemuCaps, >>> > >> config.data.nix.path = monpath; >>> > >> config.data.nix.listen = false; >>> > >> >>> > >> + /* >>> > >> + * Check if there wasn't some older qemu left running, e.g. if >>> > >> + * someone killed libvirtd during the probing phase. >>> > >> + */ >>> > >> + if (virFileExists(pidfile) && >>> > >> + virPidFileReadPath(pidfile, &pid) == 0) { >>> > >> + char *cmdpath = NULL; >>> > >> + char *cmdline = NULL; >>> > >> + int len, pos = 0; >>> > >> + >>> > >> + VIR_DEBUG("Checking if there's an older qemu process left running that " >>> > >> + "was used for capability probing"); >>> > >> + >>> > >> + if (virAsprintf(&cmdpath, "/proc/%u/cmdline", pid) < 0) >>> > >> + goto cleanup; >>> > >> + >>> > >> + len = virFileReadAll(cmdpath, 1024, &cmdline); >>> > >> + VIR_FREE(cmdpath); >>> > >> + >>> > >> + /* >>> > >> + * This cycle gets trivially skipped if there was an error in >>> > >> + * virFileReadAll(). >>> > >> + */ >>> > >> + while (len > pos) { >>> > >> + /* >>> > >> + * Find the '-pidfile /path/to/capabilities.pidfile' to be >>> > >> + * sure we won't kill anyone else. >>> > >> + */ >>> > >> + if (STREQ_NULLABLE(cmdline + pos, "-pidfile") && >>> > >> + (pos += strlen(cmdline + pos) + 1) < len && >>> > >> + STREQ_NULLABLE(cmdline + pos, pidfile)) { >>> > > >>> > >Heh fun hack. Did you consider simply trying to connect to the monitor >>> > >socket to see if the QEMU was still there ? That would be slightly more >>> > >portable as it wouldn't rely on Linux /proc >>> > > >>> > >>> > I wanted to cooperate with qemu somehow, for example killing it if we >>> > are handling a signal (that wouldn't help if the daemon is SIGKILL'd), >>> > I thought we can add some timeout option to qemu (or it's parent >>> > process that would kill it if there was no communication for a while), >>> > but that doesn't make much sense and so on. Michal then told me that >>> > we can just kill it if it still exists. And that's something that >>> > would still need to be there if the monitor is not responding, for >>> > example. >>> > >>> > Since all this machinery happens only after libvirtd has its pidfile >>> > flock()'d, we can be sure we're not killing anyone else's qemu. >>> >>> And do we actually have to check anything. We already know there is a >>> process running with its PID stored in /path/to/capabilities.pidfile and >>> the pidfile is flocked. I think it should be good enough to assume the >>> stored PID belongs to the process that flocked the pidfile. Why should >>> we care about other processes flocking the file? We would be killing a >>> random process rather than the one which locked the pidfile but that's >>> life :-) >> >> Sure we could try to acquire the flock on the pidfile and if that fails >> we should be ok to assume the PID stored in the pidfile is still valid. >> > > Either we have to check the cmdline (which is in *BSD's procfs > implementation, but we can't rely on that) *or* we can try flock()'ing > the file and killing the pid that's written inside (which is probably > better). But we can't just kill the pid without checking anything, we > could kill something else and that's certainly not what we want :) > > I'll send a v2, > Martin Thanks. I just wanted to reply your former email and then saw your patch. I think you have understood the problem reported. My idea is the same as yours. Check that the pid is qemu and kill it. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list