On 04.06.2019 17:23, Michal Privoznik wrote: > On 6/4/19 11:07 AM, Nikolay Shirokovskiy wrote: >> >> >> On 30.05.2019 12:34, Michal Privoznik wrote: >>> Changing domain definition must be guarded with acquiring modify >>> job. The problem is if there is a thread executing say >>> qemuDomainSetMemoryStatsPeriod() which is an API that acquires >>> modify job and then possibly unlock the domain object and locks >>> it again. However, becasue virDomainObjAssignDef() does not take >>> a job (only object lock) it may have changed the domain >>> definition while the other thread unlocked the domain object in >>> order to talk on the monitor. For instance: >>> >>> Thread1: >>> 1) lookup domain object and lock it >>> 2) acquire job >>> 3) get pointers to live and persistent defs >>> 4) unlock the domain object >>> 5) talk to qemu on the monitor >>> >>> Thread2 (Execute qemuDomainDefineXMLFlags): >>> 1) lookup domain object and lock it >>> 2) virDomainObjAssignDef() which overwrites persistent def and >>> frees the old one >>> 3) unlock domain object >>> >>> Thread1: >>> 6) lock the domain object again >>> 7) try to access the persistent def via pointer stored in 3) >>> >>> Now this will crash because the pointer from step 3) points to a >>> memory that was freed. >>> >>> However, if Thread2 waited and acquired modify job (just like >>> Thread1) then these two would serialize and no harm would be >>> done. >>> >>> Signed-off-by: Michal Privoznik <mprivozn@xxxxxxxxxx> >>> --- >>> src/qemu/qemu_domain.c | 3 ++- >>> src/qemu/qemu_driver.c | 29 +++++++++++++++++++++++++---- >>> src/qemu/qemu_migration.c | 7 +++++++ >>> 3 files changed, 34 insertions(+), 5 deletions(-) >>> >>> diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c >>> index c61d39b12b..fff2f1932e 100644 >>> --- a/src/qemu/qemu_domain.c >>> +++ b/src/qemu/qemu_domain.c >>> @@ -8784,7 +8784,8 @@ qemuDomainRemoveInactive(virQEMUDriverPtr driver, >>> return; >>> } >>> - qemuDomainRemoveInactiveCommon(driver, vm); >>> + if (vm->def) >>> + qemuDomainRemoveInactiveCommon(driver, vm); >>> virDomainObjListRemove(driver->domains, vm); >> >> Hmm, virDomainObjListRemoveLocked uses vm->def too. >> >>> } >>> diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c >>> index 8bbac339e0..77cb7e4b87 100644 >>> --- a/src/qemu/qemu_driver.c >>> +++ b/src/qemu/qemu_driver.c >>> @@ -1706,9 +1706,16 @@ static virDomainPtr qemuDomainCreateXML(virConnectPtr conn, >>> VIR_DOMAIN_OBJ_LIST_ADD_CHECK_LIVE))) >>> goto cleanup; >>> + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) { >>> + qemuDomainRemoveInactive(driver, vm); >>> + goto cleanup; >>> + } >>> + >>> virDomainObjAssignDef(vm, def, true, NULL); >>> def = NULL; >>> + qemuDomainObjEndJob(driver, vm); >>> + >> >> Looks like we can begin VIR_DOMAIN_JOB_OPERATION_START from the start instead >> of acquiring job twice. > > That is an async job. We shouldn't rely on async job when modifying a domain. > To me it looks ok. We don't drop vm lock between begin job and assigning definition so other threads have no chance to see invalid state. Nikolay -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list