Re: Stuck NFSv4 mounts of Isilon filer with repeated NFS4ERR_STALE_CLIENTID errors

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Wed, 25 Mar 2020 12:22:34 +0000

On Wed, 2020-03-25 at 10:30 +0000, James Pearson wrote:
> We're seeing a number of Linux (CentOS 7.5) clients getting nfs:
> server isilon not responding, still trying'  from various exports
> from
> a Isilon
> 
> I appreciate we're using a vendor's Linux (out-of-date) kernel and a
> third party filer, but if anyone can give me any pointers of how to
> debug this issue, I would be grateful (we also have a support case
> open with the Isilon vendor)
> 
> Running tshark on a client when this issue happens (taken several
> hours after the issue happened), we get repeating:
> 
>   1   12:18:11 10.78.201.95 -> 10.78.196.184 NFS 194 V4 Call RENEW
> CID: 0xde68
>   2   12:18:11 10.78.196.184 -> 10.78.201.95 NFS 114 V4 Reply (Call
> In
> 1) RENEW Status: NFS4ERR_STALE_CLIENTID
>   4   12:18:16 10.78.201.95 -> 10.78.196.184 NFS 194 V4 Call RENEW
> CID: 0xde68
>   5   12:18:16 10.78.196.184 -> 10.78.201.95 NFS 114 V4 Reply (Call
> In
> 4) RENEW Status: NFS4ERR_STALE_CLIENTID
>   7   12:18:21 10.78.201.95 -> 10.78.196.184 NFS 194 V4 Call RENEW
> CID: 0xde68
>   8   12:18:21 10.78.196.184 -> 10.78.201.95 NFS 114 V4 Reply (Call
> In
> 7) RENEW Status: NFS4ERR_STALE_CLIENTID
> ...
> 
> My knowledge of NFSv4 is sketchy, but from my (partial) reading of
> rfc7530 shouldn't the client be sending a SETCLIENTID in response to
> a
> NFS4ERR_STALE_CLIENTID - which doesn't appear to be happening here?
> 
> Although the server hasn't rebooted since the client mounted the file
> system - so not sure what might be going on ?
> 
> We are upgrading clients to the latest CentOS (RHEL) 7.7 to see if
> that 'fixes' the issue - but would appreciate any other pointers
> 

WAG: the clients all have the default hostname 'localhost.localdomain'
and are using that to identify themselves in the SETCLIENTID call? If
so, that would cause them to cancel each other's leases by declaring
client reboots of the client with name 'localhost.localdomain'.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx