Re: Linux NFSv4 client uses returned delegation in subsequent READ resulting in hang (BAD_STATEID)

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 5 Jul 2012 17:35:12 -0400

On Jul 5, 2012, at 3:55 PM, Myklebust, Trond wrote:

> On Thu, 2012-07-05 at 20:23 +0100, Charles 'Boyo wrote:
>>>> What isn't expected behaviour is for the client to DELEGRETURN
>>>> immediately upon receiving the delegation. Nor is it expected that the
>>>> READ would fail to recover in an ordinary DELEGRETURN situation.
>>>> 
>>>> This is why I'm interested in seeing what happened before the OPEN.
>>>> 
>>> 
>>> Attached is a minimally redacted trace which contains all the NFS
>>> calls and responses before and through this particular DELEGRETURN
>>> (frame 6084).
>>> I am unable to provide the full packet capture due to the sensitivity
>>> of the data contained therein.
>> 
>> Trond, did the attached trace confirm the issue as suspected?
>> 
>> Charles
> 
> No. What is happening is that the client does an OPEN of file
> "account.info". It then closes the file and for some reason that I don't
> yet understand, it decides to return the delegation.
> 
> Then the application does another OPEN, which races with the delegation
> return. Because of the race, the server returns the same delegation as
> it did in the first open call. The client (which has now forgotten about
> the original open) sees what it thinks is a new delegation, and so it
> adopts it.
> 
> So this explains why the READs end up trying to use a returned
> delegation. What remains to be explained is:
> 
> a) Why did the client return the delegation in the first place? Is it in
> some extreme memory pressure situation, or does it perhaps have some
> funky setting for /proc/sys/vm/vfs_cache_pressure?
> 
> b) Why doesn't the recovery scenario kick in and do the right thing?

My guess is that EL6.2 kernels don't have the mainline patch that adds support for recovering a single bad state ID.

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html