Re: Thoughts on mount option to configure client lease renewal time.

"NeilBrown" <neilb@xxxxxxx> · Wed, 17 Aug 2022 10:46:33 +1000

On Wed, 17 Aug 2022, Trond Myklebust wrote:
> On Wed, 2022-08-17 at 09:09 +1000, NeilBrown wrote:
> > On Tue, 16 Aug 2022, Trond Myklebust wrote:
> > > On Tue, 2022-08-16 at 09:35 +1000, NeilBrown wrote:
> > > > 
> > > > Currently the Linux NFS renews leases at 2/3 of the lease time
> > > > advised
> > > > by the server.
> > > > Some server vendors (Not Exactly Targeting Any Particular Party)
> > > > recommend very short lease times - as short a 5 seconds in fail-
> > > > over
> > > > configurations.  This means 1.7 seconds of jitter in any part of
> > > > the
> > > > system can result in leases being lost - but it does achieve fast
> > > > fail-over. 
> > > > 
> > > > If we could configure a 5 second lease-renewal on the client, but
> > > > leave
> > > > a 60 second lease time on the server, then we could get the best
> > > > of
> > > > both
> > > > worlds.  Failover would happen quickly, but you would need a much
> > > > longer
> > > > load spike or network partition to cause the loss of leases.
> > > > 
> > > > As v4.1 can end the grace period early once everyone checks in, a
> > > > large
> > > > grace period (which is needed for a large lease time) would
> > > > rarely be
> > > > a
> > > > problem.
> > > > 
> > > > So my thought is to add a mount option "lease-renew=5" for v4.1+
> > > > mounts.
> > > > The clients then uses that number providing it is less than 2/3
> > > > of
> > > > the
> > > > server-declared lease time.
> > > > 
> > > > What do people think of this?  Is there a better solution, or a
> > > > problem
> > > > with this one?
> > > > 
> > > > NeilBrown
> > > >  
> > > 
> > > I don't see how the NFS client can ever guarantee a 5 second lease
> > > renewal time, so as far as I'm concerned, this is not a problem we
> > > need
> > > to solve.
> > 
> > I completely agree with the first statement.
> > The problem we need to solve is whatever problem it is that motivates
> > server vendors to recommend unrealistically short lease times.
> > 
> > I believe this problem is fail-over time.
> > Assuming that a server fail-over happens instantly, full NFS service
> > does
> > not resume until after the grace period completes.
> > 
> > Providing clients send RECLAIM_COMPLETE appropriately, the grace
> > period
> > could easily be as long as:
> > 
> >   client renew time + time to reclaim all state
> > 
> > as clients that are idle (or busy thinking, not accessing the
> > filesystem) will not notice the failover until they send a renew,
> > which
> > may not be until the full renew time has passed.
> > 
> > The only part of that calculation that can be controlled is the
> > client
> > renew time, andat present that can only be controlled by reducing the
> > lease time.  Hence the recommendation for a short lease time.
> > 
> > If we could provide an alternate means to reducing the client renew
> > time
> > - a mount option - then there would be no incentive to recommend an
> > impractically short lease time.
> > 
> > Thanks,
> > NeilBrown
> 
> Instead of wasting a load of CPU cycles pinging the NFS layer, why not
> farm this out to the TCP layer? We already have keepalive to ensure
> that the connection stays up. All we really need is to handle the case
> where the connection is broken by the server.
> 
> So the suggestion would be that when the connection is broken, we start
> sending a SEQUENCE ping in order to figure out what happened, and
> whether we need to re-establish state.
> 
> No mount options needed...

Yes, that is an interesting idea.
This would mean that the timeo/retrans mount options would determine the
effective lease renewal time when the server stops responding.  That
seems to make sense.

I'll have a look and see how much change is required to send a renew if
there are no pending requests when the connection closes.

Thanks!
NeilBrown