Re: Thoughts on mount option to configure client lease renewal time.

Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> · Tue, 16 Aug 2022 23:32:25 +0000

On Wed, 2022-08-17 at 09:09 +1000, NeilBrown wrote:
> On Tue, 16 Aug 2022, Trond Myklebust wrote:
> > On Tue, 2022-08-16 at 09:35 +1000, NeilBrown wrote:
> > > 
> > > Currently the Linux NFS renews leases at 2/3 of the lease time
> > > advised
> > > by the server.
> > > Some server vendors (Not Exactly Targeting Any Particular Party)
> > > recommend very short lease times - as short a 5 seconds in fail-
> > > over
> > > configurations.  This means 1.7 seconds of jitter in any part of
> > > the
> > > system can result in leases being lost - but it does achieve fast
> > > fail-over. 
> > > 
> > > If we could configure a 5 second lease-renewal on the client, but
> > > leave
> > > a 60 second lease time on the server, then we could get the best
> > > of
> > > both
> > > worlds.  Failover would happen quickly, but you would need a much
> > > longer
> > > load spike or network partition to cause the loss of leases.
> > > 
> > > As v4.1 can end the grace period early once everyone checks in, a
> > > large
> > > grace period (which is needed for a large lease time) would
> > > rarely be
> > > a
> > > problem.
> > > 
> > > So my thought is to add a mount option "lease-renew=5" for v4.1+
> > > mounts.
> > > The clients then uses that number providing it is less than 2/3
> > > of
> > > the
> > > server-declared lease time.
> > > 
> > > What do people think of this?  Is there a better solution, or a
> > > problem
> > > with this one?
> > > 
> > > NeilBrown
> > >  
> > 
> > I don't see how the NFS client can ever guarantee a 5 second lease
> > renewal time, so as far as I'm concerned, this is not a problem we
> > need
> > to solve.
> 
> I completely agree with the first statement.
> The problem we need to solve is whatever problem it is that motivates
> server vendors to recommend unrealistically short lease times.
> 
> I believe this problem is fail-over time.
> Assuming that a server fail-over happens instantly, full NFS service
> does
> not resume until after the grace period completes.
> 
> Providing clients send RECLAIM_COMPLETE appropriately, the grace
> period
> could easily be as long as:
> 
>   client renew time + time to reclaim all state
> 
> as clients that are idle (or busy thinking, not accessing the
> filesystem) will not notice the failover until they send a renew,
> which
> may not be until the full renew time has passed.
> 
> The only part of that calculation that can be controlled is the
> client
> renew time, andat present that can only be controlled by reducing the
> lease time.  Hence the recommendation for a short lease time.
> 
> If we could provide an alternate means to reducing the client renew
> time
> - a mount option - then there would be no incentive to recommend an
> impractically short lease time.
> 
> Thanks,
> NeilBrown

Instead of wasting a load of CPU cycles pinging the NFS layer, why not
farm this out to the TCP layer? We already have keepalive to ensure
that the connection stays up. All we really need is to handle the case
where the connection is broken by the server.

So the suggestion would be that when the connection is broken, we start
sending a SEQUENCE ping in order to figure out what happened, and
whether we need to re-establish state.

No mount options needed...

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@xxxxxxxxxxxxxxx