Re: question about handling off an unresponsive server during lease renewal

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Mon, 13 Jul 2020 18:15:05 +0000

Hi Olga

On Mon, 2020-07-13 at 13:59 -0400, Olga Kornievskaia wrote:
> Hi Trond,
> 
> To the best of your knowledge, does the client implement this part of
> the spec that deals with when the server isn't responding and the
> lease is timing out.
> 
> RFC5661 section 8.3 talks about:
> 
> Transport retransmission delays might become so large as to
>       approach or exceed the length of the lease period.  This may be
>       particularly likely when the server is unresponsive due to a
>       restart; see Section 8.4.2.1.  If the client implementation is
> not
>       careful, transport retransmission delays can result in the
> client
>       failing to detect a server restart before the grace period
> ends.
>       The scenario is that the client is using a transport with
>       exponential backoff, such that the maximum retransmission
> timeout
>       exceeds both the grace period and the lease_time attribute.  A
>       network partition causes the client's connection's
> retransmission
>       interval to back off, and even after the partition heals, the
> next
>       transport-level retransmission is sent after the server has
>       restarted and its grace period ends.
> 
>       The client MUST either recover from the ensuing
> NFS4ERR_NO_GRACE
>       errors or it MUST ensure that, despite transport-level
>       retransmission intervals that exceed the lease_time, a SEQUENCE
>       operation is sent that renews the lease before expiration.  The
>       client can achieve this by associating a new connection with
> the
>       session, and sending a SEQUENCE operation on it.  However, if
> the
>       attempt to establish a new connection is delayed for some
> reason
>       (e.g., exponential backoff of the connection establishment
>       packets), the client will have to abort the connection
>       establishment attempt before the lease expires, and attempt to
>       reconnect.
> 
> SEQUNCE op is sent and server rebooted, it's coming up (but not
> responding).
> At the TCP layer, TCP is exponentially backing off before retrying.
> At
> some point the timeout goes more than 100s. Which means that by the
> time the client resends the server is up and out of grace.
> 
> Does the client have any control over not letting the TCP wait for
> longer than the lease period and instead, it needs to abort the
> connection and start the new one? I mean I sort of find the 2nd
> paragraph in contradiction to the fact that the client must never
> give
> up on waiting for a reply from the server? But maybe this is a
> special
> case where the client is supposed to know its lease hasn't been
> renewed and it's OK to give up?

That is what this code is supposed to ensure:

/**
 * nfs4_set_lease_period - Sets the lease period on a nfs_client
 *
 * @clp: pointer to nfs_client
 * @lease: new value for lease period
 */
void nfs4_set_lease_period(struct nfs_client *clp,
                unsigned long lease)
{
        spin_lock(&clp->cl_lock);
        clp->cl_lease_time = lease;
        spin_unlock(&clp->cl_lock);

        /* Cap maximum reconnect timeout at 1/2 lease period */
        rpc_set_connect_timeout(clp->cl_rpcclient, lease, lease >> 1);
}

The call to rpc_set_connect_timeout() iterates through all of the
transports associated with that server, and calls xprt->ops-
>set_connect_timeout() with the appropriate connect and reconnect
timeouts.

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx