Re: Linux NFSv4 client uses returned delegation in subsequent READ resulting in hang (BAD_STATEID)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jul 2, 2012, at 4:22 PM, Charles 'Boyo wrote:

> On Mon, Jul 2, 2012 at 3:09 PM, Chuck Lever <chuck.lever@xxxxxxxxxx> wrote:
>> 
>> Usually we see this behavior because of a race between an OPEN with delegation and a delegation recall.  In this case, however, the client is actively returning a READ
>> delegation, then proceeding to use it anyway.  I don't see the server's recall callback, though, and there are other indications that this trace is not complete. So it's hard
>> to be 100% confident.
>> 
> The trace is not complete, it includes just enough information to
> explain the problem.
> However I can confirm the service did not send a recall callback, the
> client returned the delegation of its own "free will".

The callback would come on a separate TCP connection.  I can't think of a reason that a client would return a delegation by itself and then subsequently start to use it.

>> 
>> As far as I know, the EL6.2 client does not have support for recovering a single bad STATEID, which is why it is looping.  That support is available in mainline kernels 3.0
>> and later.
>> 
>> However, it seems to me that it is a bug for the client to continue using a delegation that it has returned.
>> 
> Is it possible is a scheduling issue of some sort, where the READ
> should have been sent ahead of the DELEGRETURN but somehow got mixed
> up?

Or possibly that the DELEGRETURN doesn't actually remove the delegation state ID until the server has replied, and the READ request was sent before the DELEGRETURN reply arrived at the client.

>> 
>> You have already found one work-around: disable delegations on the NFS server.  Or you could mount with NFSv3.  Or, if feasible, your application could be modified to
>> use fcntl() locking.
>> 
> In my case, disabling delegation is the only feasible work-around.
> NFSv3 creates new issues with identity mapping and the application is
> closed-source.
> With delegation disabled, what else do I stand to lose apart from some
> client-side efficiencies? I have noticed that the client has resorted
> to closing and re-opening commonly used files every few seconds -
> probably an attempt to flush all data out to the server as soon as
> possible.

Delegation allows the client to leave a file open and cache data more aggressively.  The extra CLOSE operations are likely due to close-to-open requirements (NFS optimizes for serial file sharing).

> This hasn't caused me any grief, but I don't know what I'm
> missing.

If you haven't noticed any troubling behavior, then there is probably not going to be a major impact for your workload.

> 
> Regards,
> 
> Charles
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com




--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux