On Mon, Dec 09, 2013 at 12:15:52PM +0100, Peter Krempa wrote: > On 12/09/13 11:56, Daniel P. Berrange wrote: > > On Sat, Dec 07, 2013 at 02:03:00AM -0500, Adam Walters wrote: > >> This patch works around a race condition present during > >> libvirt startup. The race condition only applies when > >> using storage pool volumes for domain disks, and even > >> then, only when restarting libvirt with running domains. > >> > >> The gist of the patch is simply to enter a (limited) > >> retry loop during qemuTranslateDiskSourcePool. This > >> particular implementation does have a slight drawback, > >> though. Within that function, I can not determine if > >> we are currently starting libvirt, or if we are just > >> starting up a domain. In the latter case, this could > >> cause a 800ms delay in reporting an error that the > >> storage pool is inactive. > >> > >> I am happy to report, however, that with this patch, > >> domains continue to run without restarts regardless > >> of how often I restart libvirt. I have a fairly fast > >> hypervisor, and have never seen it try more than one > >> iteration of the retry loop unless I purposely set > >> one of the storage pools to be inactive. > >> --- > >> src/qemu/qemu_conf.c | 11 +++++++++++ > >> 1 file changed, 11 insertions(+) > >> > >> diff --git a/src/qemu/qemu_conf.c b/src/qemu/qemu_conf.c > >> index c28908a..2e52fbf 100644 > >> --- a/src/qemu/qemu_conf.c > >> +++ b/src/qemu/qemu_conf.c > >> @@ -1359,6 +1359,8 @@ qemuTranslateDiskSourcePool(virConnectPtr conn, > >> virStorageVolInfo info; > >> int ret = -1; > >> virErrorPtr savedError = NULL; > >> + int attempt = 0; > >> + int poolAutostart; > >> > >> if (def->type != VIR_DOMAIN_DISK_TYPE_VOLUME) > >> return 0; > >> @@ -1369,7 +1371,16 @@ qemuTranslateDiskSourcePool(virConnectPtr conn, > >> if (!(pool = virStoragePoolLookupByName(conn, def->srcpool->pool))) > >> return -1; > >> > >> +retry: > >> if (virStoragePoolIsActive(pool) != 1) { > >> + if (!(virStoragePoolGetAutostart(pool, &poolAutostart) < 0)) > >> + if (poolAutostart && attempt < 4) { > >> + VIR_DEBUG("Waiting for storage pool '%s' to activate", > >> + def->srcpool->pool); > >> + usleep(200*1000); /* sleep 200ms */ > >> + attempt++; > >> + goto retry; > >> + } > >> virReportError(VIR_ERR_CONFIG_UNSUPPORTED, > >> _("storage pool '%s' containing volume '%s' " > >> "is not active"), > > > > This is rather dubious and points toa bug elsewhere. The storage driver > > should complete its autostart before the QEMU driver even starts its > > own initialization. > > We definitely need to make sure that storage is available at that point. > > This particular regression was introduced by my commit e1a4d08baf9a > although was latently present for a few releases now as the volume code > isn't used that much probably. > > Prior to my patch that added the check whether the pool is available we > were blindly assuming that the pool was online. Pool drivers like > gluster don't have their internal structures initialized if the pool > isn't started and thus the translation would fail either way. > > Also the translation function is called separately from the reconnection > code, thus we can pass different arguments to it so we don't spoil the > normal code paths with unnecessary delays and other stuff. > > The question is what to do with domains that have storage on pools that > can't be initialized. Should we kill those? Should we skip translation > of the source and then something later may fail? We do we need todo translation when restarting libvirtd ? Surely we should only do translation when initialy starting a guest, and then remember that data thereafter. If we ever try re-translating data later for a running guest, we risk getting different answers which would be bad. Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list