Could you look back over previous comments? I notice there's a couple unaddressed (circular locking dependency, Documentation/filesystems/). I agree with Chuck that we don't need to reschedule the laundromat, it's OK if it takes longer to get around to cleaning up a dead client. --b. On Mon, Jan 10, 2022 at 10:50:51AM -0800, Dai Ngo wrote: > Hi Bruce, Chuck > > This series of patches implement the NFSv4 Courteous Server. > > A server which does not immediately expunge the state on lease expiration > is known as a Courteous Server. A Courteous Server continues to recognize > previously generated state tokens as valid until conflict arises between > the expired state and the requests from another client, or the server > reboots. > > The v2 patch includes the following: > > . add new callback, lm_expire_lock, to lock_manager_operations to > allow the lock manager to take appropriate action with conflict lock. > > . handle conflicts of NFSv4 locks with NFSv3/NLM and local locks. > > . expire courtesy client after 24hr if client has not reconnected. > > . do not allow expired client to become courtesy client if there are > waiters for client's locks. > > . modify client_info_show to show courtesy client and seconds from > last renew. > > . fix a problem with NFSv4.1 server where the it keeps returning > SEQ4_STATUS_CB_PATH_DOWN in the successful SEQUENCE reply, after > the courtesy client re-connects, causing the client to keep sending > BCTS requests to server. > > The v3 patch includes the following: > > . modified posix_test_lock to check and resolve conflict locks > to handle NLM TEST and NFSv4 LOCKT requests. > > . separate out fix for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v4 patch includes: > > . rework nfsd_check_courtesy to avoid dead lock of fl_lock and client_lock > by asking the laudromat thread to destroy the courtesy client. > > . handle NFSv4 share reservation conflicts with courtesy client. This > includes conflicts between access mode and deny mode and vice versa. > > . drop the patch for back channel stuck in SEQ4_STATUS_CB_PATH_DOWN. > > The v5 patch includes: > > . fix recursive locking of file_rwsem from posix_lock_file. > > . retest with LOCKDEP enabled. > > The v6 patch includes: > > . merge witn 5.15-rc7 > > . fix a bug in nfs4_check_deny_bmap that did not check for matched > nfs4_file before checking for access/deny conflict. This bug causes > pynfs OPEN18 to fail since the server taking too long to release > lots of un-conflict clients' state. > > . enhance share reservation conflict handler to handle case where > a large number of conflict courtesy clients need to be expired. > The 1st 100 clients are expired synchronously and the rest are > expired in the background by the laundromat and NFS4ERR_DELAY > is returned to the NFS client. This is needed to prevent the > NFS client from timing out waiting got the reply. > > The v7 patch includes: > > . Fix race condition in posix_test_lock and posix_lock_inode after > dropping spinlock. > > . Enhance nfsd4_fl_expire_lock to work with with new lm_expire_lock > callback > > . Always resolve share reservation conflicts asynchrously. > > . Fix bug in nfs4_laundromat where spinlock is not used when > scanning cl_ownerstr_hashtbl. > > . Fix bug in nfs4_laundromat where idr_get_next was called > with incorrect 'id'. > > . Merge nfs4_destroy_courtesy_client into nfsd4_fl_expire_lock. > > The v8 patch includes: > > . Fix warning in nfsd4_fl_expire_lock reported by test robot. > > The V9 patch include: > > . Simplify lm_expire_lock API by (1) remove the 'testonly' flag > and (2) specifying return value as true/false to indicate > whether conflict was succesfully resolved. > > . Rework nfsd4_fl_expire_lock to mark client with > NFSD4_DESTROY_COURTESY_CLIENT then tell the laundromat to expire > the client in the background. > > . Add a spinlock in nfs4_client to synchronize access to the > NFSD4_COURTESY_CLIENT and NFSD4_DESTROY_COURTESY_CLIENT flag to > handle race conditions when resolving lock and share reservation > conflict. > > . Courtesy client that was marked as NFSD4_DESTROY_COURTESY_CLIENT > are now consisdered 'dead', waiting for the laundromat to expire > it. This client is no longer allowed to use its states if it > re-connects before the laundromat finishes expiring the client. > > For v4.1 client, the detection is done in the processing of the > SEQUENCE op and returns NFS4ERR_BAD_SESSION to force the client > to re-establish new clientid and session. > For v4.0 client, the detection is done in the processing of the > RENEW and state-related ops and return NFS4ERR_EXPIRE to force > the client to re-establish new clientid.