On Thu, Mar 26, 2009 at 05:13:00PM +0900, Matt McCowan wrote: > On Mon, 23 Mar 2009 13:44:58 +0000 > "Daniel P. Berrange" <berrange@xxxxxxxxxx> wrote: > > > On Sun, Mar 22, 2009 at 07:28:36PM +0900, Matt McCowan wrote: > > > Running into an issue where, if I/O is hampered by load for example, > > > reading a largish state file (created by 'virsh save') is not allowed to > > > complete. > > > qemudStartVMDaemon in src/qemu_driver.c has a loop that waits 10 seconds > > > for the VM to be brought up. An strace against libvirt when doing a > > > 'virsh restore' against a largish state file shows the VM being sent a > > > kill when it's still happily reading from the file. > > My bad. It's not the timeout loop in qemudStartVMDaemon that's killing > it. It's as you suggested and the code is crapping out in > qemudReadMonitorOutput, seemingly when poll()ing the consoles fd - it > doesn't get any POLLIN in the 10 secs it waits. (Against latest CVS > pull) Hmm, this is the exact scenario I thought we had gotten fixed in upstream QEMU/KVM. > > > This is a little odd to me - we had previously fixed KVM migration > > code so that during startup with -incoming, it would correctly > > respond to monitor commands, explicitly to avoid libvirt timing > > out in this way. I'm wondering what has broken since then, whether > > its libvirt's usage changing, or KVM impl changing. > > I'm running kvm-83 (QEMU 0.9.1) if that's of any help. > The state files I have dragged in during testing were generally 4G+ and > worked without problem. The ones I'm playing with in the production > environment are <3G, but on a more heavily loaded system with lots of > snap shotted LVs. > > 'virsh restore' on the other VMs with <2G state files works just fine. Clearly the monitor console is not responding while it is reading in the state file & how long that takes is dependant on host OS load. As a temporary workaround the only option is really to increase that 10 second timeout significantly, if doing a restore/migrate operation. In parallel with that we'll have to look at KVM code again and figure out why its behaving this way. Daniel -- |: Red Hat, Engineering, London -o- http://people.redhat.com/berrange/ :| |: http://libvirt.org -o- http://virt-manager.org -o- http://ovirt.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: GnuPG: 7D3B9505 -o- F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 :| -- Libvir-list mailing list Libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list