> On Feb 3, 2022, at 10:42 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > > On 2/3/22 3:40 PM, Chuck Lever III wrote: >> >>> On Feb 3, 2022, at 4:38 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: >>> >>> >>> On 2/3/22 11:31 AM, Chuck Lever III wrote: >>>>> On Jan 28, 2022, at 2:39 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: >>>>> >>>>> Currently an NFSv4 client must maintain its lease by using the at least >>>>> one of the state tokens or if nothing else, by issuing a RENEW (4.0), or >>>>> a singleton SEQUENCE (4.1) at least once during each lease period. If the >>>>> client fails to renew the lease, for any reason, the Linux server expunges >>>>> the state tokens immediately upon detection of the "failure to renew the >>>>> lease" condition and begins returning NFS4ERR_EXPIRED if the client should >>>>> reconnect and attempt to use the (now) expired state. >>>>> >>>>> The default lease period for the Linux server is 90 seconds. The typical >>>>> client cuts that in half and will issue a lease renewing operation every >>>>> 45 seconds. The 90 second lease period is very short considering the >>>>> potential for moderately long term network partitions. A network partition >>>>> refers to any loss of network connectivity between the NFS client and the >>>>> NFS server, regardless of its root cause. This includes NIC failures, NIC >>>>> driver bugs, network misconfigurations & administrative errors, routers & >>>>> switches crashing and/or having software updates applied, even down to >>>>> cables being physically pulled. In most cases, these network failures are >>>>> transient, although the duration is unknown. >>>>> >>>>> A server which does not immediately expunge the state on lease expiration >>>>> is known as a Courteous Server. A Courteous Server continues to recognize >>>>> previously generated state tokens as valid until conflict arises between >>>>> the expired state and the requests from another client, or the server >>>>> reboots. >>>>> >>>>> The initial implementation of the Courteous Server will do the following: >>>>> >>>>> . When the laundromat thread detects an expired client and if that client >>>>> still has established state on the Linux server and there is no waiters >>>>> for the client's locks then deletes the client persistent record and marks >>>>> the client as COURTESY_CLIENT and skips destroying the client and all of >>>>> state, otherwise destroys the client as usual. >>>>> >>>>> . Client persistent record is added to the client database when the >>>>> courtesy client reconnects and transits to normal client. >>>>> >>>>> . Lock/delegation/share reversation conflict with courtesy client is >>>>> resolved by marking the courtesy client as DESTROY_COURTESY_CLIENT, >>>>> effectively disable it, then allow the current request to proceed >>>>> immediately. >>>>> >>>>> . Courtesy client marked as DESTROY_COURTESY_CLIENT is not allowed to >>>>> reconnect to reuse itsstate. It is expired by the laundromat asynchronously >>>>> in the background. >>>>> >>>>> Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> >>>>> --- >>>>> fs/nfsd/nfs4state.c | 454 +++++++++++++++++++++++++++++++++++++++++++++++----- >>>>> fs/nfsd/state.h | 5 + >>>>> 2 files changed, 415 insertions(+), 44 deletions(-) >>>>> >>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>>> index 1956d377d1a6..b302d857e196 100644 >>>>> --- a/fs/nfsd/nfs4state.c >>>>> +++ b/fs/nfsd/nfs4state.c >>>>> @@ -1913,14 +1915,37 @@ __find_in_sessionid_hashtbl(struct nfs4_sessionid *sessionid, struct net *net) >>>>> >>>>> static struct nfsd4_session * >>>>> find_in_sessionid_hashtbl(struct nfs4_sessionid *sessionid, struct net *net, >>>>> - __be32 *ret) >>>>> + __be32 *ret, bool *courtesy_clnt) >>>> IMO the new @courtesy_clnt parameter isn't necessary. >>>> Just create a new cl_flag: >>>> >>>> +#define NFSD4_COURTESY_CLIENT (6) /* be nice to expired client */ >>>> +#define NFSD4_DESTROY_COURTESY_CLIENT (7) >>>> >>>> #define NFSD4_CLIENT_PROMOTE_COURTESY (8) >>>> >>>> or REHYDRATE_COURTESY some such. >>>> >>>> Set that flag and check it once it is safe to call >>>> nfsd4_client_record_create(). >>> We need the 'courtesy_clnt' parameter so caller can specify >>> whether the courtesy client should be promoted or not. >> I understand what the flag is used for in the patch, but I >> prefer to see this implemented without changing the synopsis >> of all those functions. Especially adding an output parameter >> like this is usually frowned upon. >> >> The struct nfs_client can carry this flag, if not in cl_flags, >> then perhaps in another field. That struct is visible in every >> one of the callers. > > The struct nfs4_client is not available to the caller of > find_in_sessionid_hashtbl at the time it calls the function and > the current input parameters of find_in_sessionid_hashtbl can > not be used to specify this flag. I see three callers of find_in_sessionid_hashtbl(): - nfsd4_bind_conn_to_session - nfsd4_destroy_session - nfsd4_sequence In none of these callers is the courtesy_clnt variable set to a true or false value _before_ find_in_sessionid_hashtbl() is called. AFAICT, @courtesy_clnt is an output-only parameter. The returned @session::se_client field points to a client that can be examined to see if it has been promoted back to active status. -- Chuck Lever