On Thu, Jul 15, 2010 at 10:44:46AM -0400, Chris Lalancette wrote: > If you try to execute two concurrent migrations p2p > from A->B and B->A, the two libvirtd's will deadlock > trying to perform the migrations. The reason for this is > that in p2p migration, the libvirtd's are responsible for > making the RPC Prepare, Migrate, and Finish calls. However, > they are currently holding the driver lock while doing so, > which basically guarantees deadlock in this scenario. > > This patch fixes the situation by adding > qemuDomainObjEnterRemoteWithDriver and > qemuDomainObjExitRemoteWithDriver helper methods. The Enter > take an additional object reference, then drops both the > domain object lock and the driver lock. The Exit takes > both the driver and domain object lock, then drops the > reference. Adding calls to these Enter and Exit helpers > around remote calls in the various migration methods > seems to fix the problem for me in testing. > > I *believe* that this makes the situation safe, though > I'm not 100% certain. The additional domain object reference > ensures that the domain object won't disappear while this > operation is happening. The BeginJob that is called inside > of qemudDomainMigratePerform ensures that we can't execute a > second migrate (or shutdown, or save, etc) job while the > migration is active. That being said, my understanding of the > locking is still a bit shaky, so I'd like some additional > confirmation that this is indeed safe. > > Signed-off-by: Chris Lalancette <clalance@xxxxxxxxxx> > --- > src/qemu/qemu_driver.c | 39 ++++++++++++++++++++++++++++++++------- > 1 files changed, 32 insertions(+), 7 deletions(-) > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index 5f2607c..e81af84 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -531,6 +531,21 @@ static void qemuDomainObjExitMonitorWithDriver(struct qemud_driver *driver, virD > } > } > > +static void qemuDomainObjEnterRemoteWithDriver(struct qemud_driver *driver, > + virDomainObjPtr obj) > +{ > + virDomainObjRef(obj); > + virDomainObjUnlock(obj); > + qemuDriverUnlock(driver); > +} > + > +static void qemuDomainObjExitRemoteWithDriver(struct qemud_driver *driver, > + virDomainObjPtr obj) > +{ > + qemuDriverLock(driver); > + virDomainObjLock(obj); > + virDomainObjUnref(obj); > +} > > static int qemuCgroupControllerActive(struct qemud_driver *driver, > int controller) > @@ -10753,9 +10768,11 @@ static int doTunnelMigrate(virDomainPtr dom, > /* virStreamNew only fails on OOM, and it reports the error itself */ > goto cleanup; > > + qemuDomainObjEnterRemoteWithDriver(driver, vm); > internalret = dconn->driver->domainMigratePrepareTunnel(dconn, st, > flags, dname, > resource, dom_xml); > + qemuDomainObjExitRemoteWithDriver(driver, vm); > > if (internalret < 0) > /* domainMigratePrepareTunnel sets the error for us */ > @@ -10833,8 +10850,10 @@ cancel: > > finish: > dname = dname ? dname : dom->name; > + qemuDomainObjEnterRemoteWithDriver(driver, vm); > ddomain = dconn->driver->domainMigrateFinish2 > (dconn, dname, NULL, 0, uri, flags, retval); > + qemuDomainObjExitRemoteWithDriver(driver, vm); > > cleanup: > if (client_sock != -1) > @@ -10873,16 +10892,20 @@ static int doNonTunnelMigrate(virDomainPtr dom, > virDomainPtr ddomain = NULL; > int retval = -1; > char *uri_out = NULL; > + int rc; > > + qemuDomainObjEnterRemoteWithDriver(driver, vm); > /* NB we don't pass 'uri' into this, since that's the libvirtd > * URI in this context - so we let dest pick it */ > - if (dconn->driver->domainMigratePrepare2(dconn, > - NULL, /* cookie */ > - 0, /* cookielen */ > - NULL, /* uri */ > - &uri_out, > - flags, dname, > - resource, dom_xml) < 0) > + rc = dconn->driver->domainMigratePrepare2(dconn, > + NULL, /* cookie */ > + 0, /* cookielen */ > + NULL, /* uri */ > + &uri_out, > + flags, dname, > + resource, dom_xml); > + qemuDomainObjExitRemoteWithDriver(driver, vm); > + if (rc < 0) > /* domainMigratePrepare2 sets the error for us */ > goto cleanup; > > @@ -10899,8 +10922,10 @@ static int doNonTunnelMigrate(virDomainPtr dom, > > finish: > dname = dname ? dname : dom->name; > + qemuDomainObjEnterRemoteWithDriver(driver, vm); > ddomain = dconn->driver->domainMigrateFinish2 > (dconn, dname, NULL, 0, uri_out, flags, retval); > + qemuDomainObjExitRemoteWithDriver(driver, vm); > > if (ddomain) > virUnrefDomain(ddomain); Explanations and patch both looks fine to me, but I would feel more confident if Dan reviewed it too, the fine details of low level locking escape me I'm afraid. mini-ACK :-) Daniel -- Daniel Veillard | libxml Gnome XML XSLT toolkit http://xmlsoft.org/ daniel@xxxxxxxxxxxx | Rpmfind RPM search engine http://rpmfind.net/ http://veillard.com/ | virtualization library http://libvirt.org/ -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list