On Wed, Feb 14, 2018 at 11:36 PM +0100, John Ferlan <jferlan@xxxxxxxxxx> wrote: > On 01/26/2018 03:47 AM, Nikolay Shirokovskiy wrote: >> >> >> On 19.01.2018 20:23, John Ferlan wrote: >>> RFC: >>> https://www.redhat.com/archives/libvir-list/2018-January/msg00318.html >>> >>> Adjustments since RFC... >>> >>> Patches 1&2: No change, were already R-B'd >>> Patch 3: Removed code as noted in code review, update commit message >>> Patch 4: From old series removed, see below for more details >>> Patch 9: no change >>> NB: Patches 5-8 and 10 from Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >>> are removed as they seemed to not be necessary >>> >>> Replaced the former patch 4 with series of patches to (slowly) provide >>> support to disable new connections, handle removing waiting jobs, causing >>> the waiting workers to quit, and allow any running jobs to complete. >>> >>> As it turns out, waiting for running jobs to complete cannot be done >>> from the virNetServerClose callbacks because that means the event loop >>> processing done during virNetServerRun will not allow any currently >>> long running worker job thread a means to complete. >> >> Hi, John. > > Sorry - been busy in other areas lately - trying to get back to this... > >> >> So you suggest a different appoarch. Instead of introducing means to >> help rpc threads to finish after event loop is finished (stateShutdown hook in my series) >> you suggest to block futher new rpc processing and then waiting running >> rpc calls to finish keeping event loop running for that purpose. > > Somewhere along the way in one of the reviews it was suggested to give > the workers perhaps a few extra cycles to complete anything they might > be doing. It was also suggested a mechanism to do that - which is what I > tried to accomplish here. > > Based on that and that your mechanism was more of an "immediate > separation" by IIRC closing the monitor connection and letting that > close handler do whatever magic happens there. > > This not to say your mechanism is the wrong way to go about it, but it > also hasn't garnered support nor have you actively attempted to champion > it. So I presented an alternative approach. > >> >> This could lead to libvirtd never finish its termination if one of >> qemu processes do not respond for example. In case of long running >> operation such as migration finishing can take much time. On the >> over hand if we finish rpc threads abruptly as in case of stateShutdown hook >> we will deal with all possible error paths on daemon finishing. May >> be good approach is to abort long running operation, then give rpc threads >> some time to finish as you suggest and only after that finish them abruptly to handle >> hanging qemu etc. > > True, but it's also easy enough to add something to the last and updated > patch to add the QuitTimer and force an exit without going through the > rest of the "friendly" quit (IOW: more or less what Dan has suggested). > > There's an incredible amount of cycles being spent for what amounts to > trying to be nice from a SIGTERM, SIGQUIT, or SIGINT - IOW: eventual death. > > >> >> Also in this approach we keep event loop running for rpc threads only >> but there can be other threads that depend on the loop. For example if >> qemu is rebooted we spawn a thread that executes qemuProcessFakeReboot >> that sends commands to qemu (i.e. depends on event loop). We need to await >> such threads finishing too while keep event loop running. In approach of >> stateShutdown hook we finish all threads that uses qemu monitor by closing >> the monitor. >> >> And hack with timeout in a loop... I think of a different aproach for waiting >> rpc threads to finish their work. First let's make drain function only flush >> job queue and take some callback which will be called when all workers will >> be free from work (let's keep worker threads as Dan suggested). In this callback we >> can use same technique as in virNetDaemonSignalHandler. That is make some >> IO to set some flag in the event loop thread and cause virEventRunDefaultImpl >> to finish and then test this flag in virNetDaemonRun. >> > > Please feel free to spend as many cycles as you can making adjustments > to what I have. I'm working through some alterations and posting a v2 of > what I have and we'll see where it takes me/us. > > John Is there any update so far? I’m asking because I’m still getting segmentation faults and hang-ups on termination of libvirtd (using the newest version of libvirt). Example for a hang-up: ➤ bt #0 0x000003fffca8df84 in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0 #1 0x000003fffdac29ca in virCondWait (c=<optimized out>, m=<optimized out>) at ../../src/util/virthread.c:154 #2 0x000003fffdac381c in virThreadPoolFree (pool=<optimized out>) at ../../src/util/virthreadpool.c:290 #3 0x000003fffdbb21ae in virNetServerDispose (obj=0x1000cc640) at ../../src/rpc/virnetserver.c:803 #4 0x000003fffda97286 in virObjectUnref (anyobj=<optimized out>) at ../../src/util/virobject.c:350 #5 0x000003fffda97a5a in virObjectFreeHashData (opaque=<optimized out>, name=<optimized out>) at ../../src/util/virobject.c:591 #6 0x000003fffda66576 in virHashFree (table=<optimized out>) at ../../src/util/virhash.c:305 #7 0x000003fffdbaff82 in virNetDaemonDispose (obj=0x1000cc3c0) at ../../src/rpc/virnetdaemon.c:105 #8 0x000003fffda97286 in virObjectUnref (anyobj=<optimized out>) at ../../src/util/virobject.c:350 #9 0x0000000100026cd6 in main (argc=<optimized out>, argv=<optimized out>) at ../../src/remote/remote_daemon.c:1487 And segmentation faults happen for RPC jobs that are still running. […snip] -- Beste Grüße / Kind regards Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list