On Thu, Dec 21, 2017 at 01:18 PM +0100, John Ferlan <jferlan@xxxxxxxxxx> wrote: > On 12/21/2017 06:29 AM, Marc Hartmayer wrote: >> On Fri, Dec 15, 2017 at 07:16 PM +0100, John Ferlan <jferlan@xxxxxxxxxx> wrote: >>> On 12/12/2017 06:36 AM, Marc Hartmayer wrote: >>>> There is a race between virNetServerProcessClients (main thread) and >>>> remoteDispatchAuthList/remoteDispatchAuthPolkit/remoteSASLFinish (worker >>>> thread) that can lead to decrementing srv->nclients_unauth when it's >>>> zero. Since virNetServerCheckLimits relies on the value >>>> srv->nclients_unauth the underrun causes libvirtd to stop accepting >>>> new connections forever. >>>> >>>> Example race scenario (assuming libvirtd is using policykit and the >>>> client is privileged): >>>> 1. The client calls the RPC remoteDispatchAuthList => >>>> remoteDispatchAuthList is executed on a worker thread (Thread >>>> T1). We're assuming now the execution stops for some time before >>>> the line 'virNetServerClientSetAuth(client, 0)' >>>> 2. The client closes the connection irregularly. This causes the >>>> event loop to wake up and virNetServerProcessClient to be >>>> called (on the main thread T0). During the >>>> virNetServerProcessClients the srv lock is hold. The condition >>>> virNetServerClientNeedAuth(client) will be checked and as the >>>> authentication is not finished right now >>>> virNetServerTrackCompletedAuthLocked(srv) will be called => >>>> --srv->nclients_unauth => 0 >>>> 3. The Thread T1 continues, marks the client as authenticated, and >>>> calls virNetServerTrackCompletedAuthLocked(srv) => >>>> --srv->nclients_unauth => --0 => wrap around as nclient_unauth is >>>> unsigned >>>> 4. virNetServerCheckLimits(srv) will disable the services forever >>>> >>>> To fix it, add an auth_pending field to the client struct so that it >>>> is now possible to determine if the authentication process has already >>>> been handled for this client. >>>> >>>> Setting the authentication method to none for the client in >>>> virNetServerProcessClients is not a proper way to indicate that the >>>> counter has been decremented, as this would imply that the client is >>>> authenticated. >>>> >>>> Additionally, adjust the existing test cases for this new field. >>>> >>> >>> In any case, I think perhaps this can be split into two patches - one >>> that just sets/clears the auth_pending and then what's leftover as the >>> fix. >> >> I’m not sure it’s necessary as the correct usage of auth_pending is >> already the fix. But let’s see then. >> > > OK. I understand. I assume by this point I was juggling in my head a few > of the various adjustments so I figured that by adding the flag and > separating the logic between just the set/clear flag and the reason it > was added would be easier to review, but then again that would make the > set/clear patch essentially meaningless. > >>> >>>> diff --git a/src/libvirt_remote.syms b/src/libvirt_remote.syms >>>> index 7fcae970484d..cef2794e1122 100644 >>>> --- a/src/libvirt_remote.syms >>>> +++ b/src/libvirt_remote.syms >>>> @@ -138,6 +138,7 @@ virNetServerClientGetUNIXIdentity; >>>> virNetServerClientImmediateClose; […snip…] >>>> @@ -375,6 +376,7 @@ static virNetServerClientPtr >>>> virNetServerClientNewInternal(unsigned long long id, >>>> virNetSocketPtr sock, >>>> int auth, >>>> + bool auth_pending, >>>> #ifdef WITH_GNUTLS >>>> virNetTLSContextPtr tls, >>>> #endif >>>> @@ -384,6 +386,12 @@ virNetServerClientNewInternal(unsigned long long id, >>>> { >>>> virNetServerClientPtr client; >>>> >>>> + /* If the used authentication method implies that the new client >>>> + * is automatically authenticated, the authentication cannot be >>>> + * pending */ >>>> + if (auth_pending && virNetServerClientAuthMethodImpliesAuthenticated(auth)) >>> ^^^^^^ >>> Why check twice? I'm not clear on the need for this check. The setting >>> of auth_pending is based on @auth in the caller or perhaps the value of >>> auth_pending already. If there's some sort of check to be made perhaps >>> it's in ExecRestart. >> >> Normally I would prefer an assert here… The intent behind this guard was >> that it should prevent the wrong usage of this function. >> >> But I’ll move the check. >> > > I think this is perhaps why I made the separate the patches comment. I > was starting to overflow my heap. > >>> >>> Also, if we returned NULL here without some sort of error, then couldn't >>> we get the libvirt failed for some reason error message which is always >>> not fun to track down. >> >> Does it makes sense to leave the check here (and additionally add it to >> ExecRestart) and throw an error message here? >> > > Adding a check, a message, and having a NULL return to guard against > some future change passing the wrong parameters perhaps means that > future patch submitter (and eventual reviewer) didn't clearly understand > the purpose of the function... Yep. That’s why I like asserts… (for functions used internally only). You can 'tell' other developers what are the preconditions/postconditions and help them to understand the code flow. Sorry for the off topic… Short answer, I’ll move the check. > > Still this is a *NewInternal function - I would hope passing correct > arguments wouldn't be problem for it. Me too. > >>> >>>> + return NULL; >>>> + >>>> if (virNetServerClientInitialize() < 0) >>>> return NULL; >>>> >>>> @@ -393,6 +401,7 @@ virNetServerClientNewInternal(unsigned long long id, >>>> client->id = id; >>>> client->sock = virObjectRef(sock); >>>> client->auth = auth; >>>> + client->auth_pending = auth_pending; >>>> client->readonly = readonly; >>>> #ifdef WITH_GNUTLS >>>> client->tlsCtxt = virObjectRef(tls); >>>> @@ -440,6 +449,7 @@ virNetServerClientPtr virNetServerClientNew(unsigned long long id, >>>> { >>>> virNetServerClientPtr client; >>>> time_t now; >>>> + bool auth_pending = !virNetServerClientAuthMethodImpliesAuthenticated(auth); >>> >>> I think it's much easier for readability to have result be auth != >>> VIR_NET_SERVER_SERVICE_AUTH_NONE instead of this call... Perhaps even >>> easier to find all instances of AUTH_NONE. >> >> This function/condition is used in four places. But sure, if you’re >> still saying it’s easier to read and to understand that a user is >> authenticated (and therefore the authentication is no longer pending) if >> auth_method == VIR_NET_SERVER_SERVICE_AUTH_NONE then I’ll replace this >> function at all places :) >> > > By this time I wasn't sure what was going to happen from Dan's patch 8 > comment and the virNetServerClientNeedAuth[Locked] functions vs. the > name chosen (ImpliesAuthenticated) that was AFAICT trying to convey what > you discovered. > > Whether the comparison is hidden behind some call or a direct comparison > I guess just becomes an implementation detail. I use cscope a lot to > find things, so searching on NONE is perhaps easier that searching on > NONE, finding some API that returning true/false based off of NONE, and > then searching on callers for that too. OTOH, it's cleaner to have a > helper <sigh>. > > Shorter answer, your call. If you want the helper, that's fine. I’ll think about it. […snip] -- Beste Grüße / Kind regards Marc Hartmayer IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list