Re: [PATCH 0/2] sunrpc: more reliable detection of running gssd

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 13 Nov 2013 09:15:37 +1100
NeilBrown <neilb@xxxxxxx> wrote:

> On Tue, 12 Nov 2013 10:21:40 -0500 Steve Dickson <SteveD@xxxxxxxxxx> wrote:
> 
> > On 12/11/13 08:00, Jeff Layton wrote:
> > > We've gotten a lot of complaints recently about the 15s delay when
> > > doing a sec=sys mount without gssd running.
> > > 
> > > A large part of the problem is that the kernel isn't able to reliably
> > > detect when rpc.gssd is running. What we currently have is a
> > > gssd_running flag that is initially set to 1. When an upcall times out,
> > > that gets set to 0, and subsequent upcalls get a much shorter timeout
> > > (1/4s instead of 15s). It's reset back to '1' when a pipe is reopened.
> > > 
> > > The approach of using a flag like this is pretty inadequate. First, it
> > > doesn't eliminate the long delay on the initial upcall attempt. Also,
> > > if gssd spontaneously dies, then the flag will still be set to 1 until
> > > the next upcall attempt times out. Finally, it currently requires that
> > > the pipe be reopened in order to reset the flag back to true.
> > > 
> > > This patchset replaces that flag with a more reliable mechanism for
> > > detecting when gssd is running. When rpc_pipefs is mounted, it creates a
> > > new "dummy" pipe that gssd will naturally find and hold open. We'll
> > > never send an upcall down this pipe, and writing to it always fails.
> > > But, since we can detect when something is holding it open, we can use
> > > that to determine whether gssd is running.
> > > 
> > > The current patch just uses this mechanism to replace the gssd_running
> > > flag with this new mechanism. This shortens the long delay when mounting
> > > without gssd running, but does not silence these warnings:
> > > 
> > >     RPC: AUTH_GSS upcall timed out.
> > >     Please check user daemon is running.
> > > 
> > > I'm willing to add a patch to do that, but I'm a little unclear on the
> > > best way to do so. Those messages are generated by the auth_gss code. We
> > > probably do want to print them if someone mounted with sec=krb5, but
> > > suppress them when mounting with sec=sys.
> > > 
> > > Do we need to somehow pass down that intent to auth_gss? Another idea
> > > would be to call gssd_running() from the nfs mount code and use that to
> > > determine whether to try and use krb5 at all...
> > > 
> > > Discuss!
> > I've just verified that a mount, with these patches, takes about 
> > 1.2 seconds when rpc.gssd is not running.... With rpc.gssd it 
> > take about .2 seconds.
> > 
> > Tested-by: Steve Dickson <steved@xxxxxxxxxx>
> >
> 
> Still sounds like about one second too long.
> 
> In that patch I see:
> 
>  	timeout = 15 * HZ;
> -	if (!sn->gssd_running)
> +	if (!gssd_running(sn))
>  		timeout = HZ >> 2;
> 

Yeah, it's not clear to me where the extra delay there comes from
either. I was sort of hoping Steve would track that down... ;)

> Given that "!gssd_running(sn)" is now certain knowledge rather than a hint,
> can't we just skip the upcall and any timeout?
> i.e.
>  	timeout = 15 * HZ;
> -	if (!sn->gssd_running)
> +	if (!gssd_running(sn))
> - 		timeout = HZ >> 2;
> +		return -EACCES;
> 

Good point...I was trying to keep the semantic changes to a minimum,
but that does make sense. One minor nit...with the above you'll never
hit warn_gss(), so it probably makes sense to put that in there too.

I've got a v2 of the patchset that I'm working on that fixes a couple
of bugs, makes the dir name change that Trond wants, and also has a
patch that makes nfs4_init_client skip trying krb5i if gssd isn't up.
I'll probably post that tomorrow...

-- 
Jeff Layton <jlayton@xxxxxxxxxx>

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux