On Jul 2, 2012, at 4:22 PM, Charles 'Boyo wrote: > On Mon, Jul 2, 2012 at 3:09 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote: >> >> Usually we see this behavior because of a race between an OPEN with delegation and a delegation recall. In this case, however, the client is actively returning a READ >> delegation, then proceeding to use it anyway. I don't see the server's recall callback, though, and there are other indications that this trace is not complete. So it's hard >> to be 100% confident. >> > The trace is not complete, it includes just enough information to > explain the problem. > However I can confirm the service did not send a recall callback, the > client returned the delegation of its own "free will". The callback would come on a separate TCP connection. I can't think of a reason that a client would return a delegation by itself and then subsequently start to use it. >> >> As far as I know, the EL6.2 client does not have support for recovering a single bad STATEID, which is why it is looping. That support is available in mainline kernels 3.0 >> and later. >> >> However, it seems to me that it is a bug for the client to continue using a delegation that it has returned. >> > Is it possible is a scheduling issue of some sort, where the READ > should have been sent ahead of the DELEGRETURN but somehow got mixed > up? Or possibly that the DELEGRETURN doesn't actually remove the delegation state ID until the server has replied, and the READ request was sent before the DELEGRETURN reply arrived at the client. >> >> You have already found one work-around: disable delegations on the NFS server. Or you could mount with NFSv3. Or, if feasible, your application could be modified to >> use fcntl() locking. >> > In my case, disabling delegation is the only feasible work-around. > NFSv3 creates new issues with identity mapping and the application is > closed-source. > With delegation disabled, what else do I stand to lose apart from some > client-side efficiencies? I have noticed that the client has resorted > to closing and re-opening commonly used files every few seconds - > probably an attempt to flush all data out to the server as soon as > possible. Delegation allows the client to leave a file open and cache data more aggressively. The extra CLOSE operations are likely due to close-to-open requirements (NFS optimizes for serial file sharing). > This hasn't caused me any grief, but I don't know what I'm > missing. If you haven't noticed any troubling behavior, then there is probably not going to be a major impact for your workload. > > Regards, > > Charles > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Chuck Lever chuck[dot]lever[at]oracle[dot]com -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html