On Tue, 23 Jan 2024, Chuck Lever wrote: > On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote: > > > > The test on so_count in nfsd4_release_lockowner() is nonsense and > > harmful. Revert to using check_for_locks(), changing that to not sleep. > > > > First: harmful. > > As is documented in the kdoc comment for nfsd4_release_lockowner(), the > > test on so_count can transiently return a false positive resulting in a > > return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is > > clearly a protocol violation and with the Linux NFS client it can cause > > incorrect behaviour. > > > > If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still > > processing a LOCK request which failed because, at the time that request > > was received, the given owner held a conflicting lock, then the nfsd > > thread processing that LOCK request can hold a reference (conflock) to > > the lock owner that causes nfsd4_release_lockowner() to return an > > incorrect error. > > > > The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it > > never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so > > it knows that the error is impossible. It assumes the lock owner was in > > fact released so it feels free to use the same lock owner identifier in > > some later locking request. > > > > When it does reuse a lock owner identifier for which a previous RELEASE > > failed, it will naturally use a lock_seqid of zero. However the server, > > which didn't release the lock owner, will expect a larger lock_seqid and > > so will respond with NFS4ERR_BAD_SEQID. > > > > So clearly it is harmful to allow a false positive, which testing > > so_count allows. > > > > The test is nonsense because ... well... it doesn't mean anything. > > > > so_count is the sum of three different counts. > > 1/ the set of states listed on so_stateids > > 2/ the set of active vfs locks owned by any of those states > > 3/ various transient counts such as for conflicting locks. > > > > When it is tested against '2' it is clear that one of these is the > > transient reference obtained by find_lockowner_str_locked(). It is not > > clear what the other one is expected to be. > > > > In practice, the count is often 2 because there is precisely one state > > on so_stateids. If there were more, this would fail. > > > > It my testing I see two circumstances when RELEASE_LOCKOWNER is called. > > In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in > > all the lock states being removed, and so the lockowner being discarded > > (it is removed when there are no more references which usually happens > > when the lock state is discarded). When nfsd4_release_lockowner() finds > > that the lock owner doesn't exist, it returns success. > > > > The other case shows an so_count of '2' and precisely one state listed > > in so_stateid. It appears that the Linux client uses a separate lock > > owner for each file resulting in one lock state per lock owner, so this > > test on '2' is safe. For another client it might not be safe. > > > > So this patch changes check_for_locks() to use the (newish) > > find_any_file_locked() so that it doesn't take a reference on the > > nfs4_file and so never calls nfsd_file_put(), and so never sleeps. > > More to the point, find_any_file_locked() was added by commit > e0aa651068bf ("nfsd: don't call nfsd_file_put from client states > seqfile display"), which was merged several months /after/ commit > ce3c4ad7f4ce ("NFSD: Fix possible sleep during > nfsd4_release_lockowner()"). Yes. To flesh out the history: nfsd_file_put() was added in v5.4. In earlier kernels check_for_locks() would never sleep. However the problem patch was backported 4.9, 4.14, and 4.19 and should be reverted. find_any_file_locked() was added in v6.2 so when this patch is backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include find_and_file_locked() The patch should apply unchanged to stable kernels 6.2 and later. > > Not having to deal with nfsd_file_put() in check_for_locks is a Good > Thing. :-) Makes me wonder if there is anywhere is were we don't want nfsd_file_put() ... but I cannot find any obvious candidates for change. > > Am I correct in observing that the new check_for_locks() is the only > place where flc_lock and fi_lock are held concurrently? That is what I see - yes. fi_lock is taken inside cl_lock elsewhere, and we preserve the ordering in this patch. I cannot see that any nfsd locks are taken when flc_lock is held, so it is safe to take it while fi_lock and cl_lock are held. Thanks, NeilBrown