On 01.11.2017 21:51, John Ferlan wrote: > > > On 10/31/2017 02:54 AM, Nikolay Shirokovskiy wrote: >> >> >> On 30.10.2017 19:21, Martin Kletzander wrote: >>> On Mon, Oct 30, 2017 at 07:14:39AM -0400, John Ferlan wrote: >>>> From: Nikolay Shirokovskiy <nshirokovskiy@xxxxxxxxxxxxx> >>>> >>>> The problem is incorrect order of qemu driver shutdown and shutdown >>>> of netserver threads that serve client requests (thru qemu driver >>>> particularly). >>>> >>>> Net server threads are shutdown upon dispose which is triggered >>>> by last daemon object unref at the end of main function. At the same >>>> time qemu driver is shutdown earlier in virStateCleanup. As a result >>>> netserver threads see invalid driver object in the middle of request >>>> processing. >>>> >>>> Let's move shutting down netserver threads earlier to virNetDaemonClose. >>>> >>>> Note: order of last daemon unref and virStateCleanup >>>> is introduced in 85c3a182 for a valid reason. >>>> >>> >>> I must say I don't believe that's true. Reading it back, that patch is >>> wrong IMHO. I haven't gone through all the details of it and I don't >>> remember them from when I rewrote part of it, but the way this should >>> be handled is different. >>> >>> The way you can clearly identify such issues is when you see that one >>> thread determines the validity of data (pointer in this case). This >>> must never happen. That means that the pointer is used from more places >>> than how many references that object has. However each of the pointers >>> for such object should have their own reference. >>> >>> So the problem is probably somewhere else. >> >> If I understand you correctly we can fix issue in 85c3a182 in >> a differenct way. Like adding reference to daemon for every >> driver that uses it for shutdown inhibition. It will require to >> pass unref function to driver init function or a bit of wild casting >> to virObjectPtr for unref. Not sure it is worth it in this case. > > caveat: I'm still in a post KVM Forum travel brain fog... > > Perhaps a bit more simple than that... Since the 'inhibitCallback' and > the 'inhibitOpaque' are essentially the mechanism that would pass/use > @dmn, then it'd be up to the inhibitCallback to add/remove the reference > with the *given* that the inhibitOpaque is a lockable object anyway... > > Thus, virNetDaemonAddShutdownInhibition could use virObjectRef and > virNetDaemonRemoveShutdownInhibition could use virObjectUnref... The > "catch" is that for Shutdown the Unref would need to go after Unlock. > > I believe that then would "replicate" the virObjectRef done in > daemonStateInit and the virObjectUnref done at the end of > daemonRunStateInit with respect to "some outside thread" being able to > use @dmn and not have it be Unref'd for the last time at any point in > time until the "consumer" was done with it. > > Moving the Unref to after all the StateCleanup calls were done works > because it accomplishesd the same/similar task, but just held onto @dmn > longer because the original implementation didn't properly reference dmn > when it was being used for some other consumer/thread. Of course that > led to other problems which this series is addressing and I'm not clear > yet as to the impact vis-a-vis your StateShutdown series. Ok, 85c3a182 can be rewritten this way. It is more staightforward to ref/unref by consumer thread instead of relying on consumer structure (in this case 85c3a182 relies upon the fact that after virStateCleanup no hypervisor driver would use @dmn object). But we still have the issues address by this patch series and state shutdown series because the order in which @dmn and other objects will be disposed would be same for 85c3a182 and proposed 85c3a182 replacement. Nikolay > >> >> Anyway the initical conditions for current patch stay the same - >> daemon object will be unrefed after state drivers cleanup and >> RPC requests could deal with invalid state driver objects. >> >>> >>>> Signed-off-by: John Ferlan <jferlan@xxxxxxxxxx> >>>> --- >>>> src/rpc/virnetdaemon.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/src/rpc/virnetdaemon.c b/src/rpc/virnetdaemon.c >>>> index 8c21414897..33bd8e3b06 100644 >>>> --- a/src/rpc/virnetdaemon.c >>>> +++ b/src/rpc/virnetdaemon.c >>>> @@ -881,6 +881,7 @@ virNetDaemonClose(virNetDaemonPtr dmn) >>>> virObjectLock(dmn); >>>> >>>> virHashForEach(dmn->servers, daemonServerClose, NULL); >>>> + virHashRemoveAll(dmn->servers); >>>> >>> >>> If I get this correctly, you are removing the servers so that their workers get >>> cleaned up, but that should be done in a separate step. Most probably what >>> daemonServerClose should be doing. This is a workaround, but not a proper fix. > > Are you sure? The daemonServerClose is the callback for virHashForEach > which means the table->iterating would be set thus preventing > daemonServerClose from performing a virHashRemoveEntry of the server > element from the list. > > So this is to me the right fix. > > John > >> >> Well, original patch has a different approach for stopping RPC calls (see [1]) close >> to what you suggest. I think in this case it is matter of taste because there are >> no usecases which help us to prefer one way over another so I'm ok with both of them. >> >> [1] https://www.redhat.com/archives/libvir-list/2017-September/msg00999.html >> [2] https://www.redhat.com/archives/libvir-list/2017-October/msg00643.html (thread dicussing [1]) >> >>> If that's not true than the previous commit mentioned here should also be fixed >>> differently. >> >> Sorry I don't understand it. Can you elaborate on this point? >> >> >> Nikolay >>> >>> So this is a clear NACK from my POV. >>> >>>> virObjectUnlock(dmn); >>>> } >>>> -- >>>> 2.13.6 >>>> >>>> -- >>>> libvir-list mailing list >>>> libvir-list@xxxxxxxxxx >>>> https://www.redhat.com/mailman/listinfo/libvir-list >>> >>> >>> -- >>> libvir-list mailing list >>> libvir-list@xxxxxxxxxx >>> https://www.redhat.com/mailman/listinfo/libvir-list >>> -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list