On Fri, May 06, 2011 at 08:34:43AM -0600, Eric Blake wrote: > On 05/06/2011 05:57 AM, Daniel P. Berrange wrote: > > Introduce a virProcessKill function that can be safely called > > even when the job mutex is held. This allows virDomainDestroy > > to kill any VM even if it is asleep in a monitor job. The PID > > will die and the thread asleep on the monitor will then wake > > up releasing the job mutex. > > Very nice. This will help issues where if you kill a migration at the > source, then the destination is left with a stale VM with the job lock > held waiting for the incoming migration to complete. With this, you now > have the means to forcibly kill the leftover VM on the destination, > rather than waiting for a network timeout. Actually this is the one use case it won't help with. THe migration thing is a bit of a nasty hack / abuse. 'Prepare' obtains the job mutex, and 'Finish' releases it. Neither of them are actually monitoring the QEMU process, so it won't get released. I'll need some other special case code for that. This patch will help for any case where a API command is stuck waiting for a monitor command reply. > > > +++ b/src/qemu/qemu_driver.c > > @@ -1482,6 +1482,13 @@ static int qemudDomainDestroy(virDomainPtr dom) { > > goto cleanup; > > } > > > > + /* Although qemuProcessStop does this already, there may > > + * be an outstanding job active. We want to make sure we > > + * can kill the process even if a job is active. Killing > > + * it now, means the job will be released > > s/now,/now/ > > > + */ > > + qemuProcessKill(vm); > > + > > if (qemuDomainObjBeginJobWithDriver(driver, vm) < 0) > > goto cleanup; > > > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > > index a6c0dc8..c60c51f 100644 > > --- a/src/qemu/qemu_process.c > > +++ b/src/qemu/qemu_process.c > > @@ -2369,6 +2369,36 @@ cleanup: > > } > > > > > > +void qemuProcessKill(virDomainObjPtr vm) > > Do you want this to return -1 on failure to kill, such as if waitpid() > fails? On the other hand, I guess that best effort is okay; you already > did VIR_DEBUG logging of failures, and the caller can't do much even if > they were to check for a failure. I didn't bother to return any value, because all the callers are going to ignore it anyway. > > > +{ > > + int i; > > + int rc; > > + VIR_DEBUG("vm=%s pid=%d", vm->def->name, vm->pid); > > + > > + if (!virDomainObjIsActive(vm)) { > > + VIR_DEBUG("VM '%s' not active", vm->def->name); > > + return; > > + } > > + > > + for (i = 0 ; i < 15 ; i++) { > > + int signum; > > + if (i == 0) > > + signum = SIGTERM; > > + else if (i == 8) > > + signum = SIGKILL; > > + else > > + signum = 0; /* Just check for existance */ > > s/existance/existence/ > > It took me three reads to figure out what this loop is doing; a comment > would be helpful: > > /* Try SIGTERM, then wait 1.6 seconds, then try SIGKILL, all while > probing every 200ms to see if the process is gone */ > > > + > > + rc = virKillProcess(vm->pid, signum); > > + VIR_DEBUG("Iteration %d rc=%d", i, rc); > > + if (rc < 0) > > + break; > > When errno==ESRCH, this breaks the loop because the child is dead. But > for any other errno, should you log (at least at VIR_DEBUG) the errno > that causes failure, since that means we didn't exit the loop normally? Yeah i guess so. > > > + > > + usleep(200 * 1000); > > + } > > +} > > + > > ACK with those nits addressed. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list