Re: [PATCH] NFSD: TEST_STATEID should not return NFS4ERR_STALE_STATEID

Chuck Lever <chuck.lever@xxxxxxxxxx> · Fri, 25 May 2012 16:30:19 -0700 (PDT)

On May 25, 2012, at 9:46 AM, J. Bruce Fields wrote:

> On Thu, May 24, 2012 at 03:29:51PM -0400, Chuck Lever wrote:
>> The error values that TEST_STATEID is allowed to return does not
>> include NFS4ERR_STALE_STATEID.  In addition, RFC 5661 says:
>> 
>> 15.1.16.5.  NFS4ERR_STALE_STATEID (Error Code 10023)
>> 
>>   A stateid generated by an earlier server instance was used.  This
>>   error is moot in NFSv4.1 because all operations that take a stateid
>>   MUST be preceded by the SEQUENCE operation, and the earlier server
>>   instance is detected by the session infrastructure that supports
>>   SEQUENCE.
>> 
>> I triggered the NFS4ERR_STALE_STATEID during nograce recovery testing.
>> My client had updated its boot verifier, so the server instance hadn't
>> changed, but the client instance had.  Thus the server allowed the
>> SEQUENCE operation, but returned NFS4ERR_STALE_STATEID on the
>> TEST_STATEID operation.
>> 
>> After a client's lease expires, TEST_STATEID should report
>> NFS4ERR_EXPIRED for state IDs that the client tries to recover.  I
>> don't see a way to make that happen, though.
> 
> After the client's lease expires, the SEQUENCE operation will fail.
> 
> (Which I believe to be a valid, if unforgiving, server implementation.
> If we were to implement "courtesy locks" in this case, I believe we'd
> remember the clientid for longer, permit the SEQUENCE, and fail
> individual stateid's with EXPIRED as appropriate?)
> 
>> Finally, RFC 5661, section 18.48.3 has this:
>> 
>> o  Special stateids are always considered invalid (they result in the
>>    error code NFS4ERR_BAD_STATEID).
> 
> Thanks for the explanation!
> 
>> 
>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
>> ---
>> 
>> Bruce, would you consider taking something like this?
> 
> Sure; nits:
> 
>> fs/nfsd/nfs4state.c |    7 ++++---
>> 1 files changed, 4 insertions(+), 3 deletions(-)
>> 
>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>> index 9235cfa..ae1fab3 100644
>> --- a/fs/nfsd/nfs4state.c
>> +++ b/fs/nfsd/nfs4state.c
>> @@ -3365,12 +3365,13 @@ __be32 nfs4_validate_stateid(struct nfs4_client *cl, stateid_t *stateid)
>> 	struct nfs4_ol_stateid *ols;
>> 	__be32 status;
>> 
>> +	if (ZERO_STATEID(stateid) || ONE_STATEID(stateid))
>> +		return nfserr_bad_stateid;
> 
> Or inval?  This is just a buggy client.

RFC 5661 says specifically that TEST_STATEID should return NFS4ERR_BAD_STATEID in this case.

> 
>> 	if (STALE_STATEID(stateid))
>> -		return nfserr_stale_stateid;
>> -
>> +		return nfserr_bad_stateid;
> 
> Again, this is just a buggy client, since we shouldn't have gotten past
> the SEQUENCE in this case unless the client's sending a stateid that's
> actually someone else's.

In my nograce test, the server does get past the SEQUENCE.  So maybe there's a bug there too?

> If you think it's worth checking for those buggy client cases, we could
> instaed check that stateid->si_opaque.so_clid and cl->clientid agree.
> That'd cover the special-stateid checks too.

That's interesting.  I can redrive this patch with this change.

Yes, I think checking for buggy clients is a good idea.

> 
>> 	s = find_stateid(cl, stateid);
>> 	if (!s)
>> -		 return nfserr_stale_stateid;
>> +		 return nfserr_bad_stateid;
> 
> So this must be the case you actually hit.  Agreed with this change.
> 
> 
> --b.
> 
>> 	status = check_stateid_generation(stateid, &s->sc_stateid, 1);
>> 	if (status)
>> 		return status;
>> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html