The current default method of terminating the qemu process is to send a SIGTERM, wait for up to 1.6 seconds for it to cleanly shutdown, then send a SIGKILL and wait for up to 1.4 seconds more for the process to terminate. This is problematic because occasionally 1.6 seconds is not long enough for the qemu process to flush its disk buffers, so the guest's disk ends up in an inconsistent state. Although a previous patch has provided a new flag to allow applications to alter this behavior, it will take time for applications to be updated to use this new flag, and since the fault for this inconsistent state lays solidly with libvirt, libvirt should be proactive about preventing the situation even before the applications can be updated. Since this only occasionally happens when the timeout prior to SIGKILL is 1.6 seconds, this patch increases that timeout to 10 seconds. At the very least, this should reduce the occurrence from "occasionally" to "extremely rarely". (Once SIGKILL is sent, it waits another 5 seconds for the process to die before returning). Note that in the cases where it takes less than this for qemu to shutdown cleanly, libvirt will *not* wait for any longer than it would without this patch - qemuProcessKill polls the process and returns as soon as it is gone. --- (No change from previous version, just rebased on top of [PATCH 1.5/2] src/qemu/qemu_process.c | 18 ++++++++++-------- 1 files changed, 10 insertions(+), 8 deletions(-) diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index f91e7a5..26e0b78 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -3553,14 +3553,16 @@ qemuProcessKill(struct qemud_driver *driver, /* This loop sends SIGTERM (or SIGKILL if flags has * VIR_QEMU_PROCESS_KILL_FORCE and VIR_QEMU_PROCESS_KILL_NOWAIT), - * then waits a few iterations (3 seconds) to see if it - * dies. Halfway through this wait, if the qemu process still - * hasn't exited, and VIR_QEMU_PROCESS_KILL_FORCE is requested, a - * SIGKILL will be sent. Note that the FORCE mode could result - * in lost data in the guest, so it should only be used if the - * guest is hung and can't be destroyed in any other manner. + * then waits a few iterations (10 seconds) to see if it dies. If + * the qemu process still hasn't exited, and + * VIR_QEMU_PROCESS_KILL_FORCE is requested, a SIGKILL will then + * be sent, and qemuProcessKill will wait up to 5 seconds more for + * the process to exit before returning. Note that the FORCE mode + * could result in lost data in the guest, so it should only be + * used if the guest is hung and can't be destroyed in any other + * manner. */ - for (i = 0 ; i < 15; i++) { + for (i = 0 ; i < 75; i++) { int signum; if (i == 0) { if ((flags & VIR_QEMU_PROCESS_KILL_FORCE) && @@ -3570,7 +3572,7 @@ qemuProcessKill(struct qemud_driver *driver, } else { signum = SIGTERM; /* kindly suggest it should exit */ } - } else if ((i == 8) & (flags & VIR_QEMU_PROCESS_KILL_FORCE)) { + } else if ((i == 50) & (flags & VIR_QEMU_PROCESS_KILL_FORCE)) { VIR_WARN("Timed out waiting after SIG%s to process %d, " "sending SIGKILL", signame, vm->pid); signum = SIGKILL; /* kill it after a grace period */ -- 1.7.7.6 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list