On Tue, Feb 14, 2017 at 06:13:20PM +1100, Blair Bethwaite wrote: > Hi all, > > In IRC last night Dan helpfully confirmed my analysis of an issue we are > seeing attempting to launch high memory KVM guests backed by hugepages... > > In this case the guests have 240GB of memory allocated from two host NUMA > nodes to two guest NUMA nodes. The trouble is that allocating the hugepage > backed qemu process seems to take longer than the 30s QEMU_JOB_WAIT_TIME > and so libvirt then most unhelpfully kills the barely spawned guest. Dan > said there was currently no workaround available so I'm now looking at > building a custom libvirt which sets QEMU_JOB_WAIT_TIME=60s. > > I have two related questions: > 1) will this change have any untoward side-effects? > 2) if not, then is there any reason not to change it in master until a > better solution comes along (or possibly better, alter > qemuDomainObjBeginJobInternal > to give a domain start job a little longer compared to other jobs)? What is the actual error you're getting during startup. I'm not entirely sure QEMU_JOB_WAIT_TIME is the thing that's the problem. IIRC, the job wait time only comes into play when 2 threads are contending on the same QEMU process. ie one has an existing job running and a second comes along and tries to run a second job. The second will timeout after the QEMU_JOB_WAIT_TIME is reached. The first job which holds the lock will never timeout. During guest startup I didn't believe we had contending jobs in this way - all the jobs needed to startup QEMU should be serialized, so I'm not sure why QEMU_JOB_WAIT_TIME would even get hit. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :| _______________________________________________ libvirt-users mailing list libvirt-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvirt-users