On Wed, 17 Aug 2022, Chuck Lever III wrote: > > On Aug 15, 2022, at 7:35 PM, NeilBrown <neilb@xxxxxxx> wrote: > > > > Currently the Linux NFS renews leases at 2/3 of the lease time advised > > by the server. > > Some server vendors (Not Exactly Targeting Any Particular Party) > > recommend very short lease times - as short a 5 seconds in fail-over > > configurations. This means 1.7 seconds of jitter in any part of the > > system can result in leases being lost - but it does achieve fast > > fail-over. > > > > If we could configure a 5 second lease-renewal on the client, but leave > > a 60 second lease time on the server, then we could get the best of both > > worlds. Failover would happen quickly, but you would need a much longer > > load spike or network partition to cause the loss of leases. > > If loss of leases is the only concern (ie, there is no file sharing that > can cause a client to steal another's locks when the other client loses > contact with the server) then courteous server should handle that. The > Linux NFS server is now courteous, and several other implementations are > as well. "If" being the key word. Courteous servers is great and will certainly help, but doesn't provide the same guarantee as actually getting a renew in before the lease expires. > > > > As v4.1 can end the grace period early once everyone checks in, a large > > grace period (which is needed for a large lease time) would rarely be a > > problem. > > IMO the above paragraph is the most salient: if failover time is being > impacted by state recovery, use NFSv4.1 with implementations that take > proper advantage of RECLAIM_COMPLETE. > > > > So my thought is to add a mount option "lease-renew=5" for v4.1+ mounts. > > The clients then uses that number providing it is less than 2/3 of the > > server-declared lease time. > > > > What do people think of this? Is there a better solution, or a problem > > with this one? > > RECLAIM_COMPLETE is the preferred solution, if I understand your problem > statement correctly. Can you describe how it does not meet expectations? > RECLAIM_COMPLETE is an important part of the solution, but not a complete solution. If a client is idle (not touching the filesystem for a little while), then it won't notice the server failover until it sends a renew, which it might not do for 2/3 the lease time. e.g. for about 1 minute. Even if it only takes 1 second to reclaim state and send RECLAIM_COMPLETE, that is still over 1 minute that the server has to wait before it can end the grace period. To reliably reduce the effective grace period, you nee a short renew time, and the use of RECLAIM_COMPLETE. > The other side of this coin is that clients can have so much outstanding > state that they can't recover it all before the grace period expires. > To compensate, a server can limit the number of delegations it hands out, > or it can lengthen its lease/grace period. Maybe an ideal client would estimate the time it would take to recover all its state, and would ensure the gap between renewal time and lease time were at least that long. I don't know that a practical client would do that though. Certainly it would make sense for the server to extend the grace period while a client were actively reclaiming state - with some limit in case of misbehaving client. Thanks, NeilBrown