On Tue, Aug 21, 2018 at 02:20:21PM +0200, Christian Ehrhardt wrote: > On Tue, Aug 21, 2018 at 1:15 PM Daniel P. Berrangé <berrange@xxxxxxxxxx> > wrote: > > > On Tue, Aug 14, 2018 at 11:27:33AM +0200, Christian Ehrhardt wrote: > > > It was found that in cases with host devices virProcessKillPainfully > > > might be able to send signal zero to the target PID for quite a while > > > with the process already being gone from /proc/<PID>. > > > > > > That is due to cleanup and reset of devices which might include a > > > secondary bus reset that on top of the actions taken has a 1s delay > > > to let the bus settle. Due to that guests with plenty of Host devices > > > could easily exceed the default timeouts. > > > > > > To solve that, this adds an extra delay of 2s per hostdev that is > > associated > > > to a VM. > > > > > > Signed-off-by: Christian Ehrhardt <christian.ehrhardt@xxxxxxxxxxxxx> > > > --- > > > src/libvirt_private.syms | 1 + > > > src/qemu/qemu_process.c | 5 +++-- > > > src/util/virprocess.c | 18 +++++++++++++++--- > > > src/util/virprocess.h | 1 + > > > 4 files changed, 20 insertions(+), 5 deletions(-) > > > > > > diff --git a/src/libvirt_private.syms b/src/libvirt_private.syms > > > index ca4a192a4a..47ea35f864 100644 > > > --- a/src/libvirt_private.syms > > > +++ b/src/libvirt_private.syms > > > @@ -2605,6 +2605,7 @@ virProcessGetPids; > > > virProcessGetStartTime; > > > virProcessKill; > > > virProcessKillPainfully; > > > +virProcessKillPainfullyDelay; > > > virProcessNamespaceAvailable; > > > virProcessRunInMountNamespace; > > > virProcessSchedPolicyTypeFromString; > > > diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c > > > index 02fdc55156..b7bf8813da 100644 > > > --- a/src/qemu/qemu_process.c > > > +++ b/src/qemu/qemu_process.c > > > @@ -6814,8 +6814,9 @@ qemuProcessKill(virDomainObjPtr vm, unsigned int > > flags) > > > return 0; > > > } > > > > > > - ret = virProcessKillPainfully(vm->pid, > > > - !!(flags & > > VIR_QEMU_PROCESS_KILL_FORCE)); > > > + ret = virProcessKillPainfullyDelay(vm->pid, > > > + !!(flags & > > VIR_QEMU_PROCESS_KILL_FORCE), > > > + vm->def->nhostdevs * 2); > > > > This API contract is a bit wierd. You've got an arbitray x2 multiplier > > here... > > > > Out of the discussion with Alex I derived that due to the 1 sec settling of > the bus plus some extra work 2 seconds per device would be a good rule of > thumb. > The delay is meant to be in seconds and here it requests 2*nhostdevs to get > the amount needed. > > And I'd not want to make the "unit" of the call "double-seconds" :-) > > Let me add a comment at this call to explain the reasoning on the > multiplier here. > > > -virProcessKillPainfully(pid_t pid, bool force) > > > +virProcessKillPainfullyDelay(pid_t pid, bool force, unsigned int > > extradelay) > > > { > > > size_t i; > > > int ret = -1; > > > + unsigned int delay = 75 + (extradelay*5); > > > > ...and it gets another x5 multiplier here. > > > > That *5 is due to the internal unit it polls being 0.2 seconds. > I think externally anything other than seconds (double-seconds, > half-deca-seconds, ...) makes no sense. Opps, yes, I mis-understood the behaviour of the code. A comment is fine Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list