On 02.12.2013 18:40, Daniel P. Berrange wrote: > On Mon, Dec 02, 2013 at 06:38:37PM +0100, Michal Privoznik wrote: >> On 02.12.2013 18:24, Daniel P. Berrange wrote: >>> On Mon, Dec 02, 2013 at 06:17:00PM +0100, Michal Privoznik wrote: >>>> On 17.10.2013 14:40, Daniel P. Berrange wrote: >>>>> On Tue, Oct 15, 2013 at 06:03:27PM +0200, Christophe Fergeau wrote: >>>>>> Hey, >>>>>> >>>>>> Due to a configuration issue on my system, libvirtd is not starting on my >>>>>> system (not complaining about this!): >>>>>> 2013-10-15 15:40:51.024+0000: 10222: info : libvirt version: 1.1.3 >>>>>> 2013-10-15 15:40:51.024+0000: 10222: error : virNetTLSContextCheckCertFile:117 : Cannot read CA certificate >>>>>> '/home/teuf/usr/etc/pki/CA/cacert.pem': No such file or directory >>>>>> >>>>>> However, before libvirtd exits, I get another error message from the netcf >>>>>> code, which is unexpected this time: >>>>>> 2013-10-15 15:49:18.361+0000: 10222: error : netcfStateCleanup:105 : >>>>>> internal error: Attempt to close netcf state driver already closed >>>>>> >>>>>> This message comes from the call of virStateCleanup() at the end of main() >>>>>> in libvirtd.c. virStateCleanup() should not be called before >>>>>> daemonStateInit() has been called in main. >>>>>> After this call, things get more ugly as daemonStateInit() calls >>>>>> virStateInitialize() from a thread, so there's probably a small window for >>>>>> virStateInitialize() and virStateCleanup() running concurrently if an error >>>>>> occurs between the call to daemonStateInit() and the call to >>>>>> virNetServerRun(). >>>>>> >>>>>> I'm sending this email rather than a patch as I'm not sure what is the best >>>>>> way to fix it. The easy way would be for virStateCleanup() to be a noop >>>>>> when virStateInitialize() hasn't been called (iow remove the error message >>>>>> from netcfStateCleanup). However, this would leave this small race >>>>>> condition around (which is not that bad as it would only occurs in >>>>>> situations when the daemon fails to start). So another approach would be to >>>>>> set a vir_state_initialized boolean once the thread has called >>>>>> ivrStateInitialize, and only call virStateCleanup() when it's set. >>>>>> >>>>>> Or maybe there's a 3rd way to fix this? >>>>>> >>>>>> Let me know if you have any guidance into the best way to fix this, >>>>> >>>>> Yeah, I've known about this issue for a while and its a bit horrible >>>>> to solve. I think we probably need to have a global mutex to serialize >>>>> the virStateInitialize, virStateReload and virStateCleanup calls so >>>>> that none of them can run in parallel. >>>>> >>>>> I'd have virGlobalInit create the mutex instance, since we know that >>>>> is run from thread safe context. >>>>> >>>>> Then make the virState* calls hold that mutex while executing. >>>>> >>>>> We don't want virStateCleanup to skip execution if virStateInitialize >>>>> has failed though - every callback in virStateCleanup should be written >>>>> to be safe if its corresponding init function hasn't run. Just the >>>>> mutex serialization ought to be sufficient I think. >>>>> >>>>> >>>>> Daniel >>>>> >>>> >>>> Sorry for bringing up a dead thread, but as you both may have noticed I >>>> was trying to solve this too: >>>> >>>> https://www.redhat.com/archives/libvir-list/2013-November/msg01311.html >>>> >>>> And I don't think Dan, your approach is gonna work. I mean, the reason >>>> for running virStateInitialize in thread is - it may take ages to >>> >>> Actually no, the reason for using a thread is that the initialization >>> code needs to have a working event loop - so you cannot run the >>> virStateInitialize code from the same thread that is running the >>> event loop. >> >> Okay, this might be the reason too. >> >>> >>> You're probably thinking of the QEMU VM autostart code, whicih spawns a >>> separate thread to auto-start QEMU VMs to avoid delaying use of the QEMU >>> driver while (slow) autostart takes place. >>> >> >> No. I am thinking of the stateAutostart member which is called from >> within virStateInitialize(). In case of qemu driver it's >> qemuStateAutoStart() what is called, which boils down to calling >> qemuProcessStart over each autostart domain without spawning a separate >> thread for it. The thread you have in mind is used for >> qemuProcessReconnect() not qemuProcessStart(). >> >>>> complete. Hence, serializing virState* will lead to main thread waiting >>>> for the virStateInitialize to complete (which may not happen, i.e. in >>>> case of deadlock somewhere in a driver). >>>> >>>> But I was thinking of rather refined locking. Turning each of >>>> vir*DriverTab into virObjectLockable(). Then, each access would require >>>> object to lock itself. Moreover, the virStateCleanup would unregister >>>> the driver. Unfortunately, it could only unregister stateful drivers. >>> >>> This race condition scenario only hits us in two cases >>> >>> - libvirtd startup is being aborted due to some failure >>> while StateInit is still running >>> - libvirtd reload is triggered by SIGHUP causing StateReload >>> to run >> >> I can see the third scenario: the daemon exiting the event loop, i.e. as >> a result of timeout or SIGINT. In this case you don't want to wait for >> virStateInitialize() to finish, do you? > > If the event loop exits then in fact the virStateInitialize may never > complete, since it requires a functioning event loop. So in this case > then you cannot safely call virStateCleanup at all and we should just > skip it entirely and exit with trying to cleanup. The cleanup code is > pretty much worthless from a functional POV anyway - it is just there > to make valgrind happy. Yep. I vote for removing virStateCleanup(). Although, I'm not sure if NWFilter is not doing something actually useful there. Other drivers seems to just free() allocated memory (=making valgrind happy). Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list