On Fri, 23 Mar 2012 11:53:37 -0400 Jeff Layton <jlayton@xxxxxxxxxx> wrote: > On Fri, 23 Mar 2012 15:34:21 +0000 > "Myklebust, Trond" <Trond.Myklebust@xxxxxxxxxx> wrote: > > > On Fri, 2012-03-23 at 11:22 -0400, J. Bruce Fields wrote: > > > On Fri, Mar 23, 2012 at 03:20:21PM +0000, Myklebust, Trond wrote: > > > > On Fri, 2012-03-23 at 09:31 -0400, J. Bruce Fields wrote: > > > > > On Fri, Mar 23, 2012 at 08:12:08AM -0400, J. Bruce Fields wrote: > > > > > > On Wed, Mar 21, 2012 at 09:52:04AM -0400, Jeff Layton wrote: > > > > > > > Add a new top-level dir in rpc_pipefs to hold the pipe for the clientid > > > > > > > upcall. > > > > > > > > > > > > After applying this patch, my tests consistently hang. The hang happens > > > > > > in excltest (of the special connectaton tests), over nfs4.1 and krb5. > > > > > > Looking at the wire traffic, I'm seeing DELAY returned from a setattr > > > > > > for mode on a newly-created (with EXCLUSIVE4_1) file. That open got a > > > > > > delegation, so presumably that's what's causing the DELAY, though I'm > > > > > > not seeing the server send a recall. That could be a krb5 bug. > > > > > > > > > > > > Whatever bug there is here, it's hard to tell why this patch in > > > > > > particular would make it more likely. > > > > > > > > > > > > So, still investigating! > > > > > > > > > > Reproduceable by: > > > > > > > > > > mount -osec=krb5,minorversion=1 server:/export/ /mnt/ > > > > > cp cthon04/special/excltest /mnt/ > > > > > cd /mnt > > > > > ./excltest > > > > > > > > Umm... When would you ever get a DELAY in the above scenario? I can see > > > > getting an NFS4ERR_OPENMODE, but not DELAY. > > > > > > There's a setattr for mode right after the open. Is that unexpected? > > > > Well yes, it is. The NFSv4.1 exclusive open should always be sending a > > full set of attributes as part of the OPEN operation. The session replay > > cache is now supposed to guarantee the only-once semantics that the > > verifier used to provide. > > > > > The server doesn't really have to recall the delegation in that case (it > > > only needs to recall *other* clients' delegations) but I don't think > > > it's wrong to. > > > > Then why isn't it allowing the operation? Any sane client would normally > > interpret NFS4ERR_DELAY to mean that the server is doing something to > > fix whatever situation is preventing the operation from completing > > (presumably by recalling delegations in this case). Just replying DELAY > > and doing nothing is not helpful... > > > > Yeah, this seems like a clear bug in the server code. I think it's > replying DELAY in order to recall the delegation, but the delegation > isn't getting recalled for some reason. We arguably don't actually need > to recall it here, but I don't see any recall go out at all either... > > As to why this patch seems to uncover this bug -- that's a complete > mystery at this point... > ...and contrary to what Bruce has seen, I can also reproduce this when the server is running a stock (unpatched) 3.3.0 kernel from the Fedora rawhide repos. -- Jeff Layton <jlayton@xxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html