On Wed, 13 Nov 2013 09:15:37 +1100 NeilBrown <neilb@xxxxxxx> wrote: > On Tue, 12 Nov 2013 10:21:40 -0500 Steve Dickson <SteveD@xxxxxxxxxx> wrote: > > > On 12/11/13 08:00, Jeff Layton wrote: > > > We've gotten a lot of complaints recently about the 15s delay when > > > doing a sec=sys mount without gssd running. > > > > > > A large part of the problem is that the kernel isn't able to reliably > > > detect when rpc.gssd is running. What we currently have is a > > > gssd_running flag that is initially set to 1. When an upcall times out, > > > that gets set to 0, and subsequent upcalls get a much shorter timeout > > > (1/4s instead of 15s). It's reset back to '1' when a pipe is reopened. > > > > > > The approach of using a flag like this is pretty inadequate. First, it > > > doesn't eliminate the long delay on the initial upcall attempt. Also, > > > if gssd spontaneously dies, then the flag will still be set to 1 until > > > the next upcall attempt times out. Finally, it currently requires that > > > the pipe be reopened in order to reset the flag back to true. > > > > > > This patchset replaces that flag with a more reliable mechanism for > > > detecting when gssd is running. When rpc_pipefs is mounted, it creates a > > > new "dummy" pipe that gssd will naturally find and hold open. We'll > > > never send an upcall down this pipe, and writing to it always fails. > > > But, since we can detect when something is holding it open, we can use > > > that to determine whether gssd is running. > > > > > > The current patch just uses this mechanism to replace the gssd_running > > > flag with this new mechanism. This shortens the long delay when mounting > > > without gssd running, but does not silence these warnings: > > > > > > RPC: AUTH_GSS upcall timed out. > > > Please check user daemon is running. > > > > > > I'm willing to add a patch to do that, but I'm a little unclear on the > > > best way to do so. Those messages are generated by the auth_gss code. We > > > probably do want to print them if someone mounted with sec=krb5, but > > > suppress them when mounting with sec=sys. > > > > > > Do we need to somehow pass down that intent to auth_gss? Another idea > > > would be to call gssd_running() from the nfs mount code and use that to > > > determine whether to try and use krb5 at all... > > > > > > Discuss! > > I've just verified that a mount, with these patches, takes about > > 1.2 seconds when rpc.gssd is not running.... With rpc.gssd it > > take about .2 seconds. > > > > Tested-by: Steve Dickson <steved@xxxxxxxxxx> > > > > Still sounds like about one second too long. > > In that patch I see: > > timeout = 15 * HZ; > - if (!sn->gssd_running) > + if (!gssd_running(sn)) > timeout = HZ >> 2; > Yeah, it's not clear to me where the extra delay there comes from either. I was sort of hoping Steve would track that down... ;) > Given that "!gssd_running(sn)" is now certain knowledge rather than a hint, > can't we just skip the upcall and any timeout? > i.e. > timeout = 15 * HZ; > - if (!sn->gssd_running) > + if (!gssd_running(sn)) > - timeout = HZ >> 2; > + return -EACCES; > Good point...I was trying to keep the semantic changes to a minimum, but that does make sense. One minor nit...with the above you'll never hit warn_gss(), so it probably makes sense to put that in there too. I've got a v2 of the patchset that I'm working on that fixes a couple of bugs, makes the dir name change that Trond wants, and also has a patch that makes nfs4_init_client skip trying krb5i if gssd isn't up. I'll probably post that tomorrow... -- Jeff Layton <jlayton@xxxxxxxxxx>
Attachment:
signature.asc
Description: PGP signature