On Fri, Apr 06, 2018 at 06:27 PM +0200, John Ferlan <jferlan@xxxxxxxxxx> wrote: > On 04/03/2018 07:47 AM, Marc Hartmayer wrote: >> On Tue, Mar 20, 2018 at 11:25 AM +0100, Marc Hartmayer <mhartmay@xxxxxxxxxxxxxxxxxx> wrote: >>> Hi, >>> >>> there is a race condition between 'qemuDomainCreate' and >>> 'qemuDomainDestroy' causing a NULL pointer segmentation fault when >>> accessing priv->monConfig. The race condition can be easily reproduced >>> using gdb. >>> >>> (gdb) set non-stop on >>> # set breakpoint on line 'mon = qemuMonitorOpen(vm, …)' >>> (gdb) b qemu_process.c:1799 >>> # Actually, this second breakpoint is optional but it’s good to see >>> where priv->monConfig is set to NULL >>> # set breakpoint on line priv->monConfig = NULL; >>> (gdb) b qemu_process.c:6589 >>> (gdb) run >>> # continue all threads - just for the case we hit a breakpoint already >>> (gdb) c -a >>> >>> Now start a domain (that is using QEMU) >>> >>> $ virsh start domain >>> >>> The first breakpoint will be hit. Now run in a second shell >>> >>> $ virsh destroy domain >>> >>> The second breakpoint will be hit. Continue the thread where the second >>> breakpoint was hit (for this example this is thread 4) >>> >>> (gdb) thread apply 4 continue >>> >>> Now continue the thread where the first breakpoint was hit. >>> >>> => Segmentation fault because of a NULL pointer dereference at >>> config->value >>> >>> Since I'm not very familiar with that part of the code, I wanted to ask >>> for your advice. >>> >>> Thanks in advance. >>> >>> Beste Grüße / Kind regards >>> Marc Hartmayer >>> >>> IBM Deutschland Research & Development GmbH >>> Vorsitzende des Aufsichtsrats: Martina Koederitz >>> Geschäftsführung: Dirk Wittkopp >>> Sitz der Gesellschaft: Böblingen >>> Registergericht: Amtsgericht Stuttgart, HRB 243294 >> >> Any ideas? >> > > Seeing as no one else has an exact or authoritative answer... > > qemuDomainCreate{XML|WithFlags} (and a few others) will call > qemuProcessBeginJob which calls qemuDomainObjBeginAsyncJob and > qemuDomainObjSetAsyncJobMask which IIUC allows QEMU_JOB_DESTROY > to be run. > > The qemuDomainDestroyFlags calls qemuProcessBeginStopJob which calls > qemuDomainObjBeginJob (e.g. sync job) using QEMU_JOB_DESTROY, which > again IIUC is allowed to happen alongside the Async job because of the > mask setting. > > In the code where you've broken during create, the @vm object lock is > dropped allowing destroy to obtain it. So with the perfect timing and > storm the window of opportunity does exist that the monConfig could be > free'd and thus priv->monConfig set to NULL before or during the create > processing uses it in qemuMonitorOpen. If during the processing, then > obviously the config-> would "appear" to be valid, but it may not > actually be what was passed. > > The fix I believe involves using objects for virDomainChrSourceDef > rather than VIR_ALLOC'd and VIR_FREE'd memory directly. I've put > together a few patches and will post them shortly. Using the patches I > don't see a core, but rather the (I believe) expected "error: internal > error: qemu unexpectedly closed the monitor" > > John > > > Thanks for the fix! -- Beste Grüße / Kind regards Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list