On Wed, 17 Aug 2022, Trond Myklebust wrote: > On Wed, 2022-08-17 at 09:09 +1000, NeilBrown wrote: > > On Tue, 16 Aug 2022, Trond Myklebust wrote: > > > On Tue, 2022-08-16 at 09:35 +1000, NeilBrown wrote: > > > > > > > > Currently the Linux NFS renews leases at 2/3 of the lease time > > > > advised > > > > by the server. > > > > Some server vendors (Not Exactly Targeting Any Particular Party) > > > > recommend very short lease times - as short a 5 seconds in fail- > > > > over > > > > configurations. This means 1.7 seconds of jitter in any part of > > > > the > > > > system can result in leases being lost - but it does achieve fast > > > > fail-over. > > > > > > > > If we could configure a 5 second lease-renewal on the client, but > > > > leave > > > > a 60 second lease time on the server, then we could get the best > > > > of > > > > both > > > > worlds. Failover would happen quickly, but you would need a much > > > > longer > > > > load spike or network partition to cause the loss of leases. > > > > > > > > As v4.1 can end the grace period early once everyone checks in, a > > > > large > > > > grace period (which is needed for a large lease time) would > > > > rarely be > > > > a > > > > problem. > > > > > > > > So my thought is to add a mount option "lease-renew=5" for v4.1+ > > > > mounts. > > > > The clients then uses that number providing it is less than 2/3 > > > > of > > > > the > > > > server-declared lease time. > > > > > > > > What do people think of this? Is there a better solution, or a > > > > problem > > > > with this one? > > > > > > > > NeilBrown > > > > > > > > > > I don't see how the NFS client can ever guarantee a 5 second lease > > > renewal time, so as far as I'm concerned, this is not a problem we > > > need > > > to solve. > > > > I completely agree with the first statement. > > The problem we need to solve is whatever problem it is that motivates > > server vendors to recommend unrealistically short lease times. > > > > I believe this problem is fail-over time. > > Assuming that a server fail-over happens instantly, full NFS service > > does > > not resume until after the grace period completes. > > > > Providing clients send RECLAIM_COMPLETE appropriately, the grace > > period > > could easily be as long as: > > > > client renew time + time to reclaim all state > > > > as clients that are idle (or busy thinking, not accessing the > > filesystem) will not notice the failover until they send a renew, > > which > > may not be until the full renew time has passed. > > > > The only part of that calculation that can be controlled is the > > client > > renew time, andat present that can only be controlled by reducing the > > lease time. Hence the recommendation for a short lease time. > > > > If we could provide an alternate means to reducing the client renew > > time > > - a mount option - then there would be no incentive to recommend an > > impractically short lease time. > > > > Thanks, > > NeilBrown > > Instead of wasting a load of CPU cycles pinging the NFS layer, why not > farm this out to the TCP layer? We already have keepalive to ensure > that the connection stays up. All we really need is to handle the case > where the connection is broken by the server. > > So the suggestion would be that when the connection is broken, we start > sending a SEQUENCE ping in order to figure out what happened, and > whether we need to re-establish state. > > No mount options needed... Yes, that is an interesting idea. This would mean that the timeo/retrans mount options would determine the effective lease renewal time when the server stops responding. That seems to make sense. I'll have a look and see how much change is required to send a renew if there are no pending requests when the connection closes. Thanks! NeilBrown