On 5/28/19 1:34 PM, Nikolay Shirokovskiy wrote: > The function updates vm->newDef and frees previous pesistent definition > which can be used by in different thread by some API that unlocks domain > to communicate to qemu. So we have an access to freed memory in that > thread. Let's acquire job condition when changing persistent xml. > > Signed-off-by: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> > --- > > A bug example (some downstream version of libvirt, nevertheless): > > (gdb) bt > #0 0x00007f3bb77a0533 in virDomainDefHasVcpusOffline (def=def@entry=0x7f3b7802a830) at conf/domain_conf.c:1637 > #1 0x00007f3bb77cb32f in virDomainCpuDefFormat (def=0x7f3b7802a830, buf=0x7f3ba7308940) at conf/domain_conf.c:27503 > #2 virDomainDefFormatInternal (def=def@entry=0x7f3b7802a830, caps=0x7f3b70000ae0, flags=3, flags@entry=1, > buf=buf@entry=0x7f3ba7308940, xmlopt=xmlopt@entry=0x0) at conf/domain_conf.c:27850 > #3 0x00007f3bb77ced46 in virDomainDefFormat (def=def@entry=0x7f3b7802a830, caps=<optimized out>, > flags=flags@entry=1) at conf/domain_conf.c:28564 > #4 0x00007f3bb77d23f9 in virDomainSaveConfig (configDir=0x7f3b981227e0 "/etc/libvirt/qemu", caps=<optimized out>, > def=0x7f3b7802a830) at conf/domain_conf.c:28776 > #5 0x00007f3ba086b617 in qemuDomainSetMemoryStatsPeriod (dom=<optimized out>, period=30, flags=<optimized out>) > at qemu/qemu_driver.c:4839 > #6 0x00007f3bb7905613 in virDomainSetMemoryStatsPeriod (domain=domain@entry=0x7f3b74006200, period=30, flags=3) > at libvirt-domain.c:2087 > > (gdb) f 5 > #5 0x00007f3ba086b617 in qemuDomainSetMemoryStatsPeriod (dom=<optimized out>, period=30, flags=<optimized out>) > at qemu/qemu_driver.c:4839 > 4839 ret = virDomainSaveConfig(cfg->configDir, driver->caps, persistentDef); > > (gdb) p vm->def > $8 = (virDomainDefPtr) 0x7f3b8400c6f0 > (gdb) p vm->newDef > $9 = (virDomainDefPtr) 0x7f3b7001de20 > (gdb) p persistentDef > $10 = (virDomainDefPtr) 0x7f3b7802a830 > > src/qemu/qemu_driver.c | 8 ++++++++ > 1 file changed, 8 insertions(+) I've managed to reproduce this. It's really simple. I've put a sleep() here (to give myself more time to run commands): diff --git i/src/qemu/qemu_driver.c w/src/qemu/qemu_driver.c index 42b1ce2521..a5fe71e189 100644 --- i/src/qemu/qemu_driver.c +++ w/src/qemu/qemu_driver.c @@ -2405,6 +2405,7 @@ static int qemuDomainSetMemoryStatsPeriod(virDomainPtr dom, int period, qemuDomainObjEnterMonitor(driver, vm); r = qemuMonitorSetMemoryStatsPeriod(priv->mon, def->memballoon, period); + sleep(60); if (qemuDomainObjExitMonitor(driver, vm) < 0) goto endjob; if (r < 0) { And then: 1) virsh start $domain 2) virsh dommemstat --period 10 --config --live $domain 3) [from a different terminal] virsh edit $domain # just add an empty line somewhere to enforce redefining the domain 4) Observe crashed daemon > > diff --git a/src/qemu/qemu_driver.c b/src/qemu/qemu_driver.c > index 42b1ce2..683dcaa 100644 > --- a/src/qemu/qemu_driver.c > +++ b/src/qemu/qemu_driver.c > @@ -7626,6 +7626,7 @@ qemuDomainDefineXMLFlags(virConnectPtr conn, > virCapsPtr caps = NULL; > unsigned int parse_flags = VIR_DOMAIN_DEF_PARSE_INACTIVE | > VIR_DOMAIN_DEF_PARSE_ABI_UPDATE; > + bool job = false; > > virCheckFlags(VIR_DOMAIN_DEFINE_VALIDATE, NULL); > > @@ -7647,6 +7648,10 @@ qemuDomainDefineXMLFlags(virConnectPtr conn, > if (virDomainDefineXMLFlagsEnsureACL(conn, def) < 0) > goto cleanup; > > + if (qemuDomainObjBeginJob(driver, vm, QEMU_JOB_MODIFY) < 0) > + goto cleanup; > + job = true; > + > if (!(vm = virDomainObjListAdd(driver->domains, def, > driver->xmlopt, > 0, &oldDef))) This won't work really. @vm is NULL at the point of BeginJob() and the first thing that BeginJob() does is it dereferences @vm. At the same time, we can't just blindly put BeginJob() into virDomainObjListAdd() because that one is driver agnostic and BeginJob() is a qemu function. Maybe we can extend xmlopt for a new callback that if set is called just before a domain definition is changed. Then qemu driver would set this callback to BeginJob(). But this creates a problem because acquiring a job can take up to QEMU_JOB_WAIT_TIME seconds (currently 30) during which domain list is locked for write, i.e. not even 'virsh list' would be able to run meanwhile. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list