On Mon, Jan 27, 2014 at 11:28:31AM +0000, Daniel P. Berrange wrote: > On Fri, Jan 24, 2014 at 05:17:02PM +0100, Martin Kletzander wrote: > > On Fri, Jan 24, 2014 at 12:56:43PM +0000, Daniel P. Berrange wrote: > > > On Thu, Jan 23, 2014 at 07:47:54PM +0200, Pavel Fux wrote: > > > > there are 8 servers with 8 vms on each server. all the qcow images are on > > > > the nfs share on the same external server. > > > > we are starting all 64 vms at the same time. > > > > each vm is 2.5GB X 64vms = 160GB = 1280Gb > > > > to read all of the data on a 1Gbe interface will take 1280sec = 21.3min > > > > not all of the image is being read on boot so it takes only 5min > > > > > > That's interesting, but it still doesn't explain the failures. QEMU will > > > start listening on its monitor socket before it even opens any of the > > > disk images. So the time it takes to read disk images on boot should have > > > no relevance to timeouts waiting for the monitor socket. All it does between > > > exec of the QEMU binary and listening for the monitor socket is to loaded > > > libraries QEMU is linked against and load a few misc pieces like BIOS > > > firmware blobs. I just can't see a reason why this would take anywhere > > > near 5 minutes - it should be a matter of a few seconds at worst. > > > > > > > I think it does a little bit more than that, but I have no proof for > > it. When you look for most occurrences of this error wrt virt-manager > > (I'm not sure why, maybe because people using virsh deal with it > > themselves), you'll find that most of them are caused by a managed > > save. When qemu is loading, it takes more than those 3 seconds we had > > before, and it fails to start the machine. The thing is that there is > > nothing else weird on those machines, removing the managed save solves > > everything. And that's why I think it at least loads some > > initialization values (in some special cases), although I haven't been > > able to reproduce that. > > Hmm, I was thinking it might be something related to socket connect/accept > synchronization. QEMU will listen() very early, but won't accept() until > very late in startup. I've just confirmed in a test though that connect() > will succeed even if the app doesn't call accept(), since the kernel will > complete the connection at the protocol level and just queue the client. > So that doesn't explain it yet. I did a test with QEMU by adding a 'sleep(20)' into the QEMU main() method in vl.c. It only causes QEMU startup failures if we put the sleep right after parsing command line args. Once QEMU has done a listen() on the socket, libvirt handles arbitrary delays without issue. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list