Jim Fehlig writes ("Re: [Xen-devel] [PATCH 00/12] libxl: fork: SIGCHLD flexibility"): > Ok, thanks. I'm currently testing on your git branch referenced earlier > in this thread > > git://xenbits.xen.org/people/iwj/xen.git#wip.enumerate-pids-v2.1 Great. That's the one. My current version is pretty much identical - some unused variables deleted and comments edited. > > * You need to fix the timer deregistration arrangements in the > > libvirt/libxl driver to avoid the crash you identified the other day. > > Yes, I'm testing a fix now. Great. > > * Something needs to be done about the 20ms slop in the libvirt event > > loop (as it could cause libxl to lock up). If you can't get rid of > > it in the libvirt core, then adding 20ms to the every requested > > callback time in the libvirt/libxl driver would work for now. > > > > The commit msg adding the fuzz says > > Fix event test timer checks on kernels with HZ=100 > > On kernels with HZ=100, the resolution of sleeps in poll() is > quite bad. Doing a precise check on the expiry time vs the > current time will thus often thing the timer has not expired > even though we're within 10ms of the expected expiry time. This > then causes another pointless sleep in poll() for <10ms. Timers > do not need to have such precise expiration, so we treat a timer > as expired if it is within 20ms of the expected expiry time. This > also fixes the eventtest.c test suite on kernels with HZ=100 I think this is a bug in the kernel. poll() may sleep longer, but not shorter, than expected. > * daemon/event.c: Add 20ms fuzz when checking for timer expiry > > I could handle this in the libxl driver as you say, but doing so makes > me a bit nervous. Potentially locking up libxl makes me nervous too :). I was going to say that the code in libxl_osevent_occurred_timeout checked the time against the requested time and would ignore the event (thinking it was stale) if it was too early. But in fact now that I read the code this is not true. In fact I think it will work OK (modulo some things happening too soon). So the upshot is that I still think this is a bug in libvirt but I don't think it's critical to fix it. Sorry to cause undue alarm. > Yes. I've been running my tests for about 24 hours now with no problems > noted. The tests include starting/stopping a persistent VM, > creating/stopping a transient VM, rebooting a persistent VM, > saving/restoring a transient VM, and getting info on all of these VMs. > > I should probably add saving/restoring a persistent VM to the mix since > the associated libxl_ctx is never freed. Only when a persistent VM is > undefined is the libxl_ctx freed. Right. Great. Thanks, Ian. -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list