On 09/18/2017 05:08 AM, Michal Privoznik wrote: > https://bugzilla.redhat.com/show_bug.cgi?id=1448268 > > When migrating to a file (e.g. when doing 'virsh save file'), > couple of things are happening in the thread that is executing > the API: > > 1) the domain obj is locked > 2) iohelper is spawned as a separate process to handle all I/O > 3) the thread waits for iohelper to finish > 4) the domain obj is unlocked > > Now, the problem is that while the thread waits in step 3 for > iohelper to finish this may take ages because iohelper calls > fdatasync(). And unfortunately, we are waiting the whole time > with the domain locked. So if another thread wants to jump in and > say copy the domain name ('virsh list' for instance), they are > stuck. > > The solution is to unlock the domain whenever waiting for I/O and > lock it back again when it finished. > > Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> > --- > src/qemu/qemu_driver.c | 27 +++++++++++++++++++++++---- > 1 file changed, 23 insertions(+), 4 deletions(-) > I would think there'd be more concern about dropping the lock in the middle and something changing that could more negatively affect things. There's such a delicate balance between Save/Restore processing now - if you adjust the locks that could mean simultaneous usage, right? Perhaps thinking differently... Does the virdomainobjlist code for Collect, Filter, Export really need to obtain the domain obj write lock? If Export is but a "snapshot in time", then using virObject{Lock|Unlock} perhaps isn't necessary in any of the @vms list processing. Since virDomainObjListCollectIterator sets a refcnt on the object it cannot be freed. Beyond that any state checking done in virDomainObjListFilter is negated because the lock isn't held until the virGetDomain call is made during virDomainObjListExport to fill in the returned @doms list. It's quite possible after *Filter to have the 'removing' boolean get set before the virGetDomain call is made. So what's the need or purpose of the write locks then? If the object locks are removed during Export, then virsh list won't freeze while other operations that take time occur. Since the RWLock is taken on the list while @vms is created, we know nothing else can Add/Remove to/from the list - thus *that* is our snapshot in time. Anything done after than is just reducing what's in that snapshot. It's "interesting" that the domain export list uses a very different model than other drivers... That loop to take a refcnt on each object w/ the Read lock taken is something I could see duplicated for other drivers conceptually speaking of course if someone were to write that kind of common processing model ;-) John FWIW: The "older" processing model to get the Num of domains followed by getting a list of domains could continue to do the locking... although that code could use a cleanup too, but that's a different issue. > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index b334cf20b..f81d3def4 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -3216,6 +3216,25 @@ qemuOpenFileAs(uid_t fallback_uid, gid_t fallback_gid, > goto cleanup; > } > > + > +static int > +qemuFileWrapperFDClose(virDomainObjPtr vm, > + virFileWrapperFdPtr fd) > +{ > + int ret; > + > + /* virFileWrapperFd uses iohelper to write data onto disk. > + * However, iohelper calls fdatasync() which may take ages to > + * finish. Therefore, we shouldn't be waiting with the domain > + * object locked. */ > + > + virObjectUnlock(vm); > + ret = virFileWrapperFdClose(fd); > + virObjectLock(vm); > + return ret; > +} > + > + > /* Helper function to execute a migration to file with a correct save header > * the caller needs to make sure that the processors are stopped and do all other > * actions besides saving memory */ > @@ -3276,7 +3295,7 @@ qemuDomainSaveMemory(virQEMUDriverPtr driver, > goto cleanup; > } > > - if (virFileWrapperFdClose(wrapperFd) < 0) > + if (qemuFileWrapperFDClose(vm, wrapperFd) < 0) > goto cleanup; > > if ((fd = qemuOpenFile(driver, vm, path, O_WRONLY, NULL, NULL)) < 0 || > @@ -3827,7 +3846,7 @@ doCoreDump(virQEMUDriverPtr driver, > path); > goto cleanup; > } > - if (virFileWrapperFdClose(wrapperFd) < 0) > + if (qemuFileWrapperFDClose(vm, wrapperFd) < 0) > goto cleanup; > > ret = 0; > @@ -6687,7 +6706,7 @@ qemuDomainRestoreFlags(virConnectPtr conn, > > ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, data, path, > false, QEMU_ASYNC_JOB_START); > - if (virFileWrapperFdClose(wrapperFd) < 0) > + if (qemuFileWrapperFDClose(vm, wrapperFd) < 0) > VIR_WARN("Failed to close %s", path); > > qemuProcessEndJob(driver, vm); > @@ -6962,7 +6981,7 @@ qemuDomainObjRestore(virConnectPtr conn, > > ret = qemuDomainSaveImageStartVM(conn, driver, vm, &fd, data, path, > start_paused, asyncJob); > - if (virFileWrapperFdClose(wrapperFd) < 0) > + if (qemuFileWrapperFDClose(vm, wrapperFd) < 0) > VIR_WARN("Failed to close %s", path); > > cleanup: > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list