On 10/27/2016 10:04 AM, Nikolay Shirokovskiy wrote: > > > On 27.10.2016 15:59, John Ferlan wrote: >> >> >> On 10/04/2016 10:27 AM, Nikolay Shirokovskiy wrote: >>> This fix SEGV with the backtrace [1] >>> >>> This happens due to race on simultaneous finishing of libvirtd and >>> qemu process. We need to keep daemon object until all hypervisor >>> drivers are cleaned up. The other option is to take reference to >>> daemon in every hypervisor driver but this will take more work >>> and we really don't need to. Drivers cleanup procedure is synchronous >>> thus let's just remove last reference to daemon after drivers cleanup. >>> >>> [1] backtrace (shortened a bit) >>> >>> 1 0x00007fd3a791b2e6 in virCondWait (c=<optimized out>, m=<optimized out>) at util/virthread.c:154 >>> 2 0x00007fd3a791bcb0 in virThreadPoolFree (pool=0x7fd38024ee00) at util/virthreadpool.c:266 >>> 3 0x00007fd38edaa00e in qemuStateCleanup () at qemu/qemu_driver.c:1116 >>> 4 0x00007fd3a79abfeb in virStateCleanup () at libvirt.c:808 >>> 5 0x00007fd3a85f2c9e in main (argc=<optimized out>, argv=<optimized out>) at libvirtd.c:1660 >>> >>> Thread 1 (Thread 0x7fd38722d700 (LWP 32256)): >>> 0 0x00007fd3a7900910 in virClassIsDerivedFrom (klass=0xdfd36058d4853, parent=0x7fd3a8f394d0) at util/virobject.c:169 >>> 1 0x00007fd3a7900c4e in virObjectIsClass (anyobj=anyobj@entry=0x7fd3a8f2f850, klass=<optimized out>) at util/virobject.c:365 >>> 2 0x00007fd3a7900c74 in virObjectLock (anyobj=0x7fd3a8f2f850) at util/virobject.c:317 >>> 3 0x00007fd3a7a24d5d in virNetDaemonRemoveShutdownInhibition (dmn=0x7fd3a8f2f850) at rpc/virnetdaemon.c:547 >>> 4 0x00007fd38ed722cf in qemuProcessStop (driver=driver@entry=0x7fd380103810, vm=vm@entry=0x7fd38025b6d0, reason=reason@entry=VIR_DOMAIN_SHUTOFF_SHUTDOWN, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE, >>> flags=flags@entry=0) at qemu/qemu_process.c:5786 >>> 5 0x00007fd38edd9428 in processMonitorEOFEvent (vm=0x7fd38025b6d0, driver=0x7fd380103810) at qemu/qemu_driver.c:4588 >>> 6 qemuProcessEventHandler (data=<optimized out>, opaque=0x7fd380103810) at qemu/qemu_driver.c:4632 >>> 7 0x00007fd3a791bb55 in virThreadPoolWorker (opaque=opaque@entry=0x7fd3a8f1e4c0) at util/virthreadpool.c:145 >>> --- >>> daemon/libvirtd.c | 4 +++- >>> 1 file changed, 3 insertions(+), 1 deletion(-) >>> >> >> I'd like to adjust the commit message just a bit... In particular >> removing the "The other option..." sentence. > > ok > >> >> So the "problem" discovered is that virStateInitialize requires/uses the >> "dmn" reference as the argument to the daemonInhibitCallback and by >> dereferencing the object too early and then calling the virStateCleanup >> which calls the daemonInhibitCallback using the dmn we end up with the SEGV. > > yep > >> >>> diff --git a/daemon/libvirtd.c b/daemon/libvirtd.c >>> index 95c1b1c..eefdefc 100644 >>> --- a/daemon/libvirtd.c >>> +++ b/daemon/libvirtd.c >>> @@ -1628,7 +1628,6 @@ int main(int argc, char **argv) { >>> virObjectUnref(qemuProgram); >>> virObjectUnref(adminProgram); >>> virNetDaemonClose(dmn); >>> - virObjectUnref(dmn); >>> virObjectUnref(srv); >>> virObjectUnref(srvAdm); >>> virNetlinkShutdown(); >>> @@ -1658,6 +1657,9 @@ int main(int argc, char **argv) { >>> driversInitialized = false; >>> virStateCleanup(); >>> } >> >> Another option would have been to move the above hunk up... Since >> setting the daemonStateInit is one of the last things done before >> running the event loop, it stands to reason shutting down should be done >> in the opposite order thus causing those InhibitCallbacks to be run >> perhaps after virNetlinkEventServiceStopAll and before virNetDaemonClose. > > By the above hunk you mean virStateCleanup() et all? If yes - then I don't > think so. The thing is that start is tricky. > > 1. We start the network part, but keep it in disabled state. > In this state it creates sockets and listens on them but not accept > connections. > > 2. Initialize all the hyper drivers and then enable the network part > to let it accept connections. > > Thus effectively the order is reverse to what is seems - > first hyper drivers and then network servers. > > Actually there is a direct reasoning why hyper drivers shutted down after > network part - otherwise requests can be delivered to hyper drivers in > invalid state. > Yeah - it's an intricate and delicate balance, especially that inhibit callback stuff. Since you only have 1 domain and 1 libvirtd I'll keep the order as is... Tks - John > Nikolay > >> >> Any thoughts to that? >> >> This I assume works, but do we run into "other" issues because we're >> running those callbacks after virNetDaemonClose and virNetlinkShutdown? >> >> Of course the "fear" of moving is that it causes "other" timing issues >> with previous adjustments here, in particular: >> >> commit id '5c756e580' and 'a602e90bc1' which introduced and adjusted >> driversInitialized >> >> I'm OK with leaving things as is, but perhaps someone else will see this >> an comment as well. >> >> John >> >> >>> + /* unref daemon only here as hypervisor drivers can >>> + call shutdown inhibition functions on cleanup */ >>> + virObjectUnref(dmn); >>> >>> return ret; >>> } >>> -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list