Re: [PATCH 13/20] NFS: Fix recovery from NFS4ERR_CLID_INUSE

Chuck Lever <chuck.lever@xxxxxxxxxx> · Thu, 26 Apr 2012 15:46:17 -0400

On Apr 26, 2012, at 3:14 PM, Myklebust, Trond wrote:

> On Thu, 2012-04-26 at 15:04 -0400, Chuck Lever wrote:
>> On Apr 26, 2012, at 2:53 PM, Myklebust, Trond wrote:
>> 
>>> On Thu, 2012-04-26 at 14:43 -0400, Chuck Lever wrote:
>>>> On Apr 26, 2012, at 12:55 PM, Myklebust, Trond wrote:
>>>> 
>>>>> On Thu, 2012-04-26 at 12:24 -0400, Chuck Lever wrote:
>>>>>> On Apr 23, 2012, at 4:55 PM, Chuck Lever wrote:
>>>>> Then lets move the flavour out of the clientid string,
>>>> 
>>>> Removing the flavor from the nfs_client_id4 string makes sense.
>>>> 
>>>>> and just settle
>>>>> for handling CLID_INUSE by changing the flavour on the SETCLIENTID call.
>>>> 
>>>> This is where I get hazy.  
>>>> 
>>>> If I simply change the authentication flavor on the existing clp->cl_rpcclient, will this affect ongoing RENEW operations that also use this transport?  Do we want subsequent RENEW operations to use the new flavor?
>>>> 
>>>> Thinking hypothetically, it seems to me that CLID_INUSE is really an indication of a permanent configuration error, or a software bug, and we should not bother to recover.  But maybe that's my limited imagination.  Under what use cases do you think CLID_INUSE might occur and it might be useful to attempt recovery?
>>>> 
>>> 
>>> The server caches the principal name that was used to call SETCLIENTID
>>> when the lease was established. Any attempt to call SETCLIENTID with a
>>> different principal will result in CLID_INUSE unless the lease has
>>> expired.
>>> 
>>> So what I was proposing wasn't that you try to change the authentication
>>> flavour on an existing nfs_client. It was that when you are probing, you
>>> can use the CLID_INUSE reply from SETCLIENTID as a direct indication
>>> that the server is indeed trunked, and that you already hold a lease on
>>> that server, but that the authentication flavour that you are trying to
>>> use is wrong.
>> 
>> The use case would be that my client has mounted a server via address X using authentication flavor 1, and then tries to mount the same server via address Y using authentication flavor 2.
> 
> ...for which the result should be that all setclientid/confirm and renew
> requests will use flavour 1.

Agreed.

>> Do we even need to retry the SETCLIENTID and to perform a SETCLIENTID_CONFIRM in that case?
> 
> Yes. Otherwise we end up with 2 leases on the same server.

I don't see how...  If the second SETCLIENTID fails with CLID_INUSE then the server still has the first lease that's using flavor 1.  "Boom, done."

>> Now, what about nfs4_reclaim_lease() ?  If the client sees CLID_INUSE during a lease reclaim, no trunking discovery is involved.
> 
> That would mean that the lease was expired, and that someone sent a
> SETCLIENTID call to the server using our clientid string, but using the
> wrong principal. There are 2 cases:
> 
> 1) Someone is spoofing our client. I've no idea how to recover from
> this, short of changing the clientid string.

Maybe we should keep cl_uniquifier for case 1...?  Since nfs4_reclaim_lease() is called in the state manager, it has to do something to recover or make the waiting process error out.

> 2) The server is trunked, the lease expired, and we happened to call
> 'mount' while it was expired, and inadvertently sent a SETCLIENTID
> +SETCLIENTID_CONFIRM call to the server using a different IP address,
> and using the wrong principal.

The clid_init_mutex should exclude this case...?

-- 
Chuck Lever
chuck[dot]lever[at]oracle[dot]com

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html