Jim Fehlig wrote: >> +static int >> +libxlDomainMigratePerform3(virDomainPtr dom, >> + const char *xmlin ATTRIBUTE_UNUSED, >> + const char *cookiein ATTRIBUTE_UNUSED, >> + int cookieinlen ATTRIBUTE_UNUSED, >> + char **cookieout ATTRIBUTE_UNUSED, >> + int *cookieoutlen ATTRIBUTE_UNUSED, >> + const char *dconnuri ATTRIBUTE_UNUSED, >> + const char *uri, >> + unsigned long flags, >> + const char *dname ATTRIBUTE_UNUSED, >> + unsigned long resource ATTRIBUTE_UNUSED) >> +{ >> + libxlDriverPrivatePtr driver = dom->conn->privateData; >> + char *hostname = NULL; >> + int port; >> + char *servname = NULL; >> + struct addrinfo hints, *res; >> + int sockfd = -1; >> + int ret = -1; >> + >> + virCheckFlags(LIBXL_MIGRATION_FLAGS, -1); >> + >> + libxlDriverLock(driver); >> + >> + if (doParseURI(uri, &hostname, &port)) >> + goto cleanup; >> + >> + VIR_DEBUG("hostname = %s, port = %d", hostname, port); >> + >> + if (virAsprintf(&servname, "%d", port ? port : DEFAULT_MIGRATION_PORT) < 0) { >> + virReportOOMError(); >> + goto cleanup; >> + } >> + >> + if ((sockfd = socket(AF_INET, SOCK_STREAM, 0)) < 0 ){ >> + libxlError(VIR_ERR_OPERATION_FAILED, >> + _("Failed to create socket")); >> + goto cleanup; >> + } >> + >> + memset(&hints, 0, sizeof(hints)); >> + hints.ai_family = AF_INET; >> + hints.ai_socktype = SOCK_STREAM; >> + if (getaddrinfo(hostname, servname, &hints, &res) || res == NULL) { >> + libxlError(VIR_ERR_OPERATION_FAILED, >> + _("IP address lookup failed")); >> + goto cleanup; >> + } >> + >> + if (connect(sockfd, res->ai_addr, res->ai_addrlen) < 0) { >> + libxlError(VIR_ERR_OPERATION_FAILED, >> + _("Connect error")); >> + goto cleanup; >> + } >> + >> + ret = doMigrateSend(dom, flags, sockfd); >> >> > > Hmm, the entire driver is locked during the migration. I just noticed > libxlDomainSave{Flags} also holds the driver lock during save :-(. > libxlDomainCoreDump drops the lock during the dump and IMO migration > should follow the same pattern. > After some more testing, following the pattern in libxlDomainCoreDump is not a good idea as it leads to a deadlock. # virsh dump dom /var/lib/libvirt/libxl/save/test.dump d1. lock driver d2. get virDomainObjPtr for domain, acquiring dom obj lock d3. unlock driver d4. core dump domain d5. lock driver d6. do post dump stuff d7. unlock driver d8. unlock dom obj lock While d4 is in progress, list domains # virsh list l1. get num domains l2. lock driver l3. call virDomainObjListNumOfDomains, which iterates through domains, obtaining dom obj lock in process l3 will block when trying to obtain dom obj lock for the domain being dumped, and note that it is holding the driver lock. When d4 is finished, it will attempt to acquire the driver lock - deadlock. A quick fix would be to lock the driver for duration of the dump, similar to save. But we really need to prevent locking the driver during these long running operations. Question for other libvirt devs: Many of the libxl driver functions use this pattern - lock driver - vm = virDomainFindByUUID // acquires dom obj lock - unlock driver - do stuff - virDomainObjUnlock In some cases, "do stuff" requires obtaining the driver lock (e.g. removing a domain from driver's domain list), in which case there is potential for deadlock. Any suggestions for preventing the deadlock *and* avoiding locking the driver during long running operations? Thanks, Jim -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list