On 02.12.2013 18:24, Daniel P. Berrange wrote: > On Mon, Dec 02, 2013 at 06:17:00PM +0100, Michal Privoznik wrote: >> On 17.10.2013 14:40, Daniel P. Berrange wrote: >>> On Tue, Oct 15, 2013 at 06:03:27PM +0200, Christophe Fergeau wrote: >>>> Hey, >>>> >>>> Due to a configuration issue on my system, libvirtd is not starting on my >>>> system (not complaining about this!): >>>> 2013-10-15 15:40:51.024+0000: 10222: info : libvirt version: 1.1.3 >>>> 2013-10-15 15:40:51.024+0000: 10222: error : virNetTLSContextCheckCertFile:117 : Cannot read CA certificate >>>> '/home/teuf/usr/etc/pki/CA/cacert.pem': No such file or directory >>>> >>>> However, before libvirtd exits, I get another error message from the netcf >>>> code, which is unexpected this time: >>>> 2013-10-15 15:49:18.361+0000: 10222: error : netcfStateCleanup:105 : >>>> internal error: Attempt to close netcf state driver already closed >>>> >>>> This message comes from the call of virStateCleanup() at the end of main() >>>> in libvirtd.c. virStateCleanup() should not be called before >>>> daemonStateInit() has been called in main. >>>> After this call, things get more ugly as daemonStateInit() calls >>>> virStateInitialize() from a thread, so there's probably a small window for >>>> virStateInitialize() and virStateCleanup() running concurrently if an error >>>> occurs between the call to daemonStateInit() and the call to >>>> virNetServerRun(). >>>> >>>> I'm sending this email rather than a patch as I'm not sure what is the best >>>> way to fix it. The easy way would be for virStateCleanup() to be a noop >>>> when virStateInitialize() hasn't been called (iow remove the error message >>>> from netcfStateCleanup). However, this would leave this small race >>>> condition around (which is not that bad as it would only occurs in >>>> situations when the daemon fails to start). So another approach would be to >>>> set a vir_state_initialized boolean once the thread has called >>>> ivrStateInitialize, and only call virStateCleanup() when it's set. >>>> >>>> Or maybe there's a 3rd way to fix this? >>>> >>>> Let me know if you have any guidance into the best way to fix this, >>> >>> Yeah, I've known about this issue for a while and its a bit horrible >>> to solve. I think we probably need to have a global mutex to serialize >>> the virStateInitialize, virStateReload and virStateCleanup calls so >>> that none of them can run in parallel. >>> >>> I'd have virGlobalInit create the mutex instance, since we know that >>> is run from thread safe context. >>> >>> Then make the virState* calls hold that mutex while executing. >>> >>> We don't want virStateCleanup to skip execution if virStateInitialize >>> has failed though - every callback in virStateCleanup should be written >>> to be safe if its corresponding init function hasn't run. Just the >>> mutex serialization ought to be sufficient I think. >>> >>> >>> Daniel >>> >> >> Sorry for bringing up a dead thread, but as you both may have noticed I >> was trying to solve this too: >> >> https://www.redhat.com/archives/libvir-list/2013-November/msg01311.html >> >> And I don't think Dan, your approach is gonna work. I mean, the reason >> for running virStateInitialize in thread is - it may take ages to > > Actually no, the reason for using a thread is that the initialization > code needs to have a working event loop - so you cannot run the > virStateInitialize code from the same thread that is running the > event loop. Okay, this might be the reason too. > > You're probably thinking of the QEMU VM autostart code, whicih spawns a > separate thread to auto-start QEMU VMs to avoid delaying use of the QEMU > driver while (slow) autostart takes place. > No. I am thinking of the stateAutostart member which is called from within virStateInitialize(). In case of qemu driver it's qemuStateAutoStart() what is called, which boils down to calling qemuProcessStart over each autostart domain without spawning a separate thread for it. The thread you have in mind is used for qemuProcessReconnect() not qemuProcessStart(). >> complete. Hence, serializing virState* will lead to main thread waiting >> for the virStateInitialize to complete (which may not happen, i.e. in >> case of deadlock somewhere in a driver). >> >> But I was thinking of rather refined locking. Turning each of >> vir*DriverTab into virObjectLockable(). Then, each access would require >> object to lock itself. Moreover, the virStateCleanup would unregister >> the driver. Unfortunately, it could only unregister stateful drivers. > > This race condition scenario only hits us in two cases > > - libvirtd startup is being aborted due to some failure > while StateInit is still running > - libvirtd reload is triggered by SIGHUP causing StateReload > to run I can see the third scenario: the daemon exiting the event loop, i.e. as a result of timeout or SIGINT. In this case you don't want to wait for virStateInitialize() to finish, do you? > > In both of those cases I think it is absolutely desirable that > virStateReload and virStateCleanup both block until virStateInitialize > completes. > > The question of virStateInitialize deadlock has no bearing on this. > Any such problems are simply a bug that we must fix. So I don't see > why we shouldn't have a global mutex serializing the three > virState{Init,Reload,Cleanup} APIs. > > Regards, > Daniel > -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list