Hey, On 11/09/2015 01:39 PM, J. Bruce Fields wrote: > On Mon, Nov 09, 2015 at 08:05:07AM -0500, Steve Dickson wrote: >> Recently a bug was found that was causing a >> TGT fetch for every mount upcall. The bug was >> caused when forking for every mount was introduce. >> The global memory for the TGT cache was being >> freed when the forked process existed. >> >> The fix we came up with was to only fork on non-root >> upcalls, basically mount upcalls would no longer fork. >> In debugging the patch it became apparent that if the >> process hung, all NFS mounts on that client would be blocked. >> So at this point rpc.gssd is a single point of failure. >> >> This patch replaces the forking/non-forking with creating >> pthreads for every upcall which I think is a better >> solution to the original problem since pthreads can share >> global data. > > I seem to recall the reason for the fork is to allow dropping some > privileges while processing the upcall, is that right? I don't see this where privileges are being dropped. > But looking at pthreads(7), it looks like those are probably > shared (e.g., it says user and group IDs are process-wide). I think they are and that's also why thread can access the same global data. > >> I was also hoping using pthread would bring more asynchronous >> to rpc.gssd. I was thinking rpc.gssd could take an upcall, >> fire off a thread to handle it, the go back and listen >> for more upcalls. >> >> Unfortunately this is not the case. It seems, maybe due to >> my lack of my pthreads understanding, that after each >> pthread_create() call, a pthread_join() call, which waits for >> the created to stop, is needed. Similar to fork/wait.. > > Actually making gssd thread-safe would be a significant effort. Is it because the MIT libs are not thread-safe? Isn't the gssd_k5_kt_princ_list the only global list? Something to do with the upcalls? > >> This means if an upcall pthread gets hung the daemon >> is also hung... The same single point of failure... >> >> I do believe using threads is a better solution than >> the non-fork solution, but rpc.gssd is still a single >> point of failure. Plus I'm hoping moving to pthread will >> allow us to solve that problem. > > So this doesn't actually fix anything right now? No.. it does not... :-) But I do think its a clearer way handling global lists via multiple threads/process. It also makes the top of process_krb5_upcall easier to read, IMHO... Plus it does introduce pthread to nfs-utils.. So maybe some day we can pull some these daemons into the 21th century by multi-threading them.. or kill them! I'm good either way. 8-) Thanks for the cycles!!! steved. -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html