Re: Migration hangs on Gentoo with KVM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/17/2011 02:30 PM, Jonathan Stoppani wrote:
Thanks for the prompt answer Eric! Yes, nc has a q option:

-q, --hold-timeout=SEC1[:SEC2]   Set hold timeout(s) for local [and remote]


We still haven't incorporated patches to autodetect nc usage on the remote side (some have been proposed by Guido, but there were some additional issues to address first). Hopefully by 0.9.5...

Until that is fixed, then it very well could be that you are deadlocking the libvirtd handling of the remote connection due to nc holding the connection open too long, explaining while all further attempts to do something with the domain are getting stuck waiting for the nc connection to resolve.

Tested using qemu+tcp and it hangs the same. If I interrupt the migration (^C), the domain is correctly destroyed on the destination but left in the paused state on the source. If I try to start it manually, I obtain this error:

# virsh resume 1
error: Failed to resume domain 1
error: Timed out during operation: cannot acquire state change lock

This is the internal mutex lock used for serializing access to libvirt internal structures, such as when coordinating with a remote server (which coordination involves the use of nc). When you get this message, about the only thing you can do is restart libvirtd. Which version of libvirt were you testing? 0.9.4 adds quite a few improvements on being able to gracefully recover from failed migrations.


Any insights?

Can someone shed some light on the libvirt locking possibilities? It seems to me that sanlock is not supported on gentoo (and libvirt is compiled using --without-sanlock); could this be the cause of the problem?

Completely unrelated. sanlock is a program for controlling access to shared file storage, and has nothing to do with the internal mutex lock failure message you quoted above.

Is there some way to explicitly set the locking mechanism to a noop in the libvirt configuration?

You are confusing two terms; using the sanlock or no-op disk manager has nothing to do with libvirtd getting confused and deadlocking on internal data structures. If you built --without-sanlock, then you are already using the no-op disk manager; but if sanlock is compiled in, you control whether to use it by modifying /etc/libvirt/qemu.conf. But making a configuration change there won't affect the problem you actually saw above.

--
Eric Blake   eblake@xxxxxxxxxx    +1-801-349-2682
Libvirt virtualization library http://libvirt.org


[Index of Archives]     [Virt Tools]     [Lib OS Info]     [Fedora Users]     [Fedora Desktop]     [Fedora SELinux]     [Yosemite News]     [KDE Users]

  Powered by Linux