On 08/17/2011 02:30 PM, Jonathan Stoppani wrote:
Thanks for the prompt answer Eric! Yes, nc has a q option:
-q, --hold-timeout=SEC1[:SEC2] Set hold timeout(s) for local [and remote]
We still haven't incorporated patches to autodetect nc usage on the
remote side (some have been proposed by Guido, but there were some
additional issues to address first). Hopefully by 0.9.5...
Until that is fixed, then it very well could be that you are deadlocking
the libvirtd handling of the remote connection due to nc holding the
connection open too long, explaining while all further attempts to do
something with the domain are getting stuck waiting for the nc
connection to resolve.
Tested using qemu+tcp and it hangs the same. If I interrupt the migration (^C), the domain is correctly destroyed on the destination but left in the paused state on the source. If I try to start it manually, I obtain this error:
# virsh resume 1
error: Failed to resume domain 1
error: Timed out during operation: cannot acquire state change lock
This is the internal mutex lock used for serializing access to libvirt
internal structures, such as when coordinating with a remote server
(which coordination involves the use of nc). When you get this message,
about the only thing you can do is restart libvirtd. Which version of
libvirt were you testing? 0.9.4 adds quite a few improvements on being
able to gracefully recover from failed migrations.
Any insights?
Can someone shed some light on the libvirt locking possibilities? It seems to me that sanlock is not supported on gentoo (and libvirt is compiled using --without-sanlock); could this be the cause of the problem?
Completely unrelated. sanlock is a program for controlling access to
shared file storage, and has nothing to do with the internal mutex lock
failure message you quoted above.
Is there some way to explicitly set the locking mechanism to a noop in the libvirt configuration?
You are confusing two terms; using the sanlock or no-op disk manager has
nothing to do with libvirtd getting confused and deadlocking on internal
data structures. If you built --without-sanlock, then you are already
using the no-op disk manager; but if sanlock is compiled in, you control
whether to use it by modifying /etc/libvirt/qemu.conf. But making a
configuration change there won't affect the problem you actually saw above.
--
Eric Blake eblake@xxxxxxxxxx +1-801-349-2682
Libvirt virtualization library http://libvirt.org