On Fri, 23 Mar 2012 15:34:21 +0000 "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> wrote: > On Fri, 2012-03-23 at 11:22 -0400, J. Bruce Fields wrote: > > On Fri, Mar 23, 2012 at 03:20:21PM +0000, Myklebust, Trond wrote: > > > On Fri, 2012-03-23 at 09:31 -0400, J. Bruce Fields wrote: > > > > On Fri, Mar 23, 2012 at 08:12:08AM -0400, J. Bruce Fields wrote: > > > > > On Wed, Mar 21, 2012 at 09:52:04AM -0400, Jeff Layton wrote: > > > > > > Add a new top-level dir in rpc_pipefs to hold the pipe for the clientid > > > > > > upcall. > > > > > > > > > > After applying this patch, my tests consistently hang. The hang happens > > > > > in excltest (of the special connectaton tests), over nfs4.1 and krb5. > > > > > Looking at the wire traffic, I'm seeing DELAY returned from a setattr > > > > > for mode on a newly-created (with EXCLUSIVE4_1) file. That open got a > > > > > delegation, so presumably that's what's causing the DELAY, though I'm > > > > > not seeing the server send a recall. That could be a krb5 bug. > > > > > > > > > > Whatever bug there is here, it's hard to tell why this patch in > > > > > particular would make it more likely. > > > > > > > > > > So, still investigating! > > > > > > > > Reproduceable by: > > > > > > > > mount -osec=krb5,minorversion=1 server:/export/ /mnt/ > > > > cp cthon04/special/excltest /mnt/ > > > > cd /mnt > > > > ./excltest > > > > > > Umm... When would you ever get a DELAY in the above scenario? I can see > > > getting an NFS4ERR_OPENMODE, but not DELAY. > > > > There's a setattr for mode right after the open. Is that unexpected? > > Well yes, it is. The NFSv4.1 exclusive open should always be sending a > full set of attributes as part of the OPEN operation. The session replay > cache is now supposed to guarantee the only-once semantics that the > verifier used to provide. > > > The server doesn't really have to recall the delegation in that case (it > > only needs to recall *other* clients' delegations) but I don't think > > it's wrong to. > > Then why isn't it allowing the operation? Any sane client would normally > interpret NFS4ERR_DELAY to mean that the server is doing something to > fix whatever situation is preventing the operation from completing > (presumably by recalling delegations in this case). Just replying DELAY > and doing nothing is not helpful... > Yeah, this seems like a clear bug in the server code. I think it's replying DELAY in order to recall the delegation, but the delegation isn't getting recalled for some reason. We arguably don't actually need to recall it here, but I don't see any recall go out at all either... As to why this patch seems to uncover this bug -- that's a complete mystery at this point... -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html