On May 3, 2013, at 2:44 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Fri, 3 May 2013 18:33:54 +0000 > "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> wrote: > >> On Fri, 2013-05-03 at 14:24 -0400, Jeff Layton wrote: >>> On Fri, 3 May 2013 13:56:13 -0400 >>> Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >>> >>>> >>>> On May 3, 2013, at 1:25 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>>> >>>>> I've noticed that when running a 3.10-pre kernel that if I try to mount >>>>> up a NFSv4 filesystem that it now takes ~15s for the mount to complete. >>>>> >>>>> Here's a little rpcdebug output: >>>>> >>>>> [ 3056.385078] svc: server ffff8800368fc000 waiting for data (to = 9223372036854775807) >>>>> [ 3056.392056] RPC: new task initialized, procpid 2471 >>>>> [ 3056.392758] RPC: allocated task ffff88010cd90100 >>>>> [ 3056.393303] RPC: 42 __rpc_execute flags=0x1280 >>>>> [ 3056.393630] RPC: 42 call_start nfs4 proc SETCLIENTID (sync) >>>>> [ 3056.394056] RPC: 42 call_reserve (status 0) >>>>> [ 3056.394368] RPC: 42 reserved req ffff8801019f9600 xid 21ad6c40 >>>>> [ 3056.394783] RPC: wake_up_first(ffff88010a989990 "xprt_sending") >>>>> [ 3056.395252] RPC: 42 call_reserveresult (status 0) >>>>> [ 3056.395595] RPC: 42 call_refresh (status 0) >>>>> [ 3056.395901] RPC: gss_create_cred for uid 0, flavor 390004 >>>>> [ 3056.396361] RPC: gss_create_upcall for uid 0 >>>>> [ 3071.396134] RPC: AUTH_GSS upcall timed out. >>>>> Please check user daemon is running. >>>>> [ 3071.397374] RPC: gss_create_upcall for uid 0 result -13 >>>>> [ 3071.398192] RPC: 42 call_refreshresult (status -13) >>>>> [ 3071.398873] RPC: 42 call_refreshresult: refresh creds failed with error -13 >>>>> [ 3071.399881] RPC: 42 return 0, status -13 >>>>> >>>>> The problem is that we're now trying to upcall for GSS creds to do the >>>>> SETCLIENTID call, but this host isn't running rpc.gssd. Not running >>>>> rpc.gssd is pretty common for people not using kerberized NFS. I think >>>>> we'll see a lot of complaints about this. >>>>> >>>>> Is this expected? >>>> >>>> Yes. >>>> >>>> There are operations like SETCLIENTID and GETATTR(fs_locations) which should always use an integrity-checking security flavor, even if particular mount points use sec=sys. >>>> >>>> There are cases where GSS is not available, and we fall back to using AUTH_SYS. That should happen as quickly as possible, I agree. >>>> >>>>> If so, what's the proposed remedy? >>>>> Simply have everyone run rpc.gssd even if they're not using kerberized NFS? >>>> >>>> >>>> That's one possibility. Or we could shorten the upcall timeout. Or, add a mechanism by which rpc.gssd can provide a positive indication to the kernel that it is running. >>>> >>>> It doesn't seem like an intractable problem. >>>> >>> >>> Nope, it's not intractable at all... >>> >>> Currently, the gssd upcall uses the RPC_PIPE_WAIT_FOR_OPEN flag to >>> allow you to queue upcalls to be processed when the daemon isn't up >>> yet. When the daemon starts, it processes that queue. The caller gives >>> up after 15s (which is what's happening here), and the upcall >>> eventually gets scraped out of the queue after 30s. >>> >>> We could stop using that flag on this rpc_pipe and simply require that >>> the daemon be up and running before attempting any sort of AUTH_GSS >>> rpc. That might be a little less friendly in the face of boot-time >>> ordering problems, but it should presumably make this problem go away. >> >> You probably don't want to do that... The main reason for the >> RPC_PIPE_WAIT_FOR_OPEN is that even if the gssd daemon is running, it >> takes it a moment or two to notice that a new client directory has been >> created, and that there is a new 'krb' pipe to attach to. >> > > Ok yeah, good point... > > Shortening the timeout will also suck -- that'll just reduce the pain > somewhat but will still be a performance regression. It looks like even > specifying '-o sec=sys' doesn't disable this behavior. Should it? Nope. We should always use krb5i if a GSS context can be established with our machine cred. As I said before, SETCLIENTID and GETATTR(fs_locations) really should use an integrity-protecting security flavor no matter what flavor is in effect on the mount points themselves. > Instead of using AUTH_GSS for SETCLIENTID by default, would it make > sense to add a switch (module parm?) that turns it on so that it can be > an opt-in thing rather than doing this by default? Why add another tunable when we really should just fix the delay? Besides, if gssd is running and no keytab exists, then the fallback to AUTH_SYS should be fast. Is that not an effective workaround until we address the delay problem? -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html