Hi, I was recently looking into a case which essentially looked like this: 1. virsh shutdown guest 2. after <1 second the qemu process was gone from /proc/ 3. but libvirt spun in virProcessKillPainfully because the process was still reachable via signals 4. virProcessKillPainfully eventually fails after 15 seconds and the guest stays in "in shutdown" state forever This is not one of the common cases I've found for virProcessKillPainfully to break: - bad I/O e.g. NFS gets qemu stuck - CPU overload stalls things to death - qemu not being reaped (by init) All of the above would have the process still available in /proc/<pid> as Zombie or in uninterruptible sleep, but that is not true in my case. It turned out that the case was dependent on the amount of hostdev resources passed to the guest. Debugging showed that with 8 and more likely 16 GPUs passed it took ~18 seconds from SIGTERM to "no more be reachable with signal 0". I haven't conducted much more tests but stayed on the 16 GPU case, but I'm rather sure more devices might make it take even longer. Discussion with a few kernel folks revealed that the kill(2) man page on signal 0 has to be taken very literal "check for the existence of a process ID" - you can read this as "the PID exists, but the Process is no more". I'm unsure why the kernel would take that much time to clean up as I thought taking away /proc/<PID> would be almost the last step in the cleanup of a task. patch 2: I happened to find that there seems to be no much better way than signal-0 to check, but finding that this isn't reliable if the kernel can still accept for quite some time even with the pid gone from all other interfaces that I could find - so I wanted to suggest a fallback in virProcessKillPainfully that considers the absence of /proc/<pid> as a valid "the process is gone" as well on top of the ESRCH of signal-0. We could also use the open FDs we have e.g. to the qemu monitor to see if the remote end is dead, but that didn't seem more readable/reliable to me and would have to cross quite some code to know about the FDs. But maybe someone else here has the insight what exactly would take the time in the kernel that I see and that might bring us to totally different solutions (therefore RFC). patch 1: Finally after working through this code for a while I got the feeling that if we are in a bad/non-responsive case after 10 seconds upgrading to SIGKILL we should give it some more time to take effect. We reach this in stressful cases only anyway and only if force is set, so then waiting a bit more helps to resolve some of the other cases that I found on the mailing list about virProcessKillPainfully being stuck. If one has a personal interest in the 15 seconds we had before lets add a VIR_WARN on 15 seconds if that would be better, but overall wait a bit more. P.S. Afer a short discussion with Daniel on IRC I'm also adding Alex explicitly for passthrough experience. P.P.S.: For now this really is only meant as an RFC to kick off the discussion. I got taken away the system that I could trigger this case easily on before I could complete a final verification. But the case is interesting enough to start the discussion now. Christian Ehrhardt (2): process: wait longer 5->30s on hard shutdown process: accept the lack of /proc/<pid> as valid process removal src/util/virprocess.c | 29 +++++++++++++++++++++++++++-- 1 file changed, 27 insertions(+), 2 deletions(-) -- 2.17.1 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list