On Sat, Jan 27, 2024 at 02:18:07PM -0800, gregkh@xxxxxxxxxxxxxxxxxxx wrote: > > This is a note to let you know that I've just added the patch titled > > nfsd: fix RELEASE_LOCKOWNER > > to the 6.1-stable tree which can be found at: > http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary > > The filename of the patch is: > nfsd-fix-release_lockowner.patch > and it can be found in the queue-6.1 subdirectory. > > If you, or anyone else, feels it should not be added to the stable tree, > please let <stable@xxxxxxxxxxxxxxx> know about it. This one should not be added to origin/linux-6.1.y. It has at least one pre-requisite that is not present in that kernel. I intend to send a properly integrated and tested series to you soon for 6.1 and three earlier stable kernels where this patch failed to apply. Note that I tagged this one for 6.2 and later: > Cc: stable@xxxxxxxxxxxxxxx # v6.2+ Did I do it incorrectly? > From edcf9725150e42beeca42d085149f4c88fa97afd Mon Sep 17 00:00:00 2001 > From: NeilBrown <neilb@xxxxxxx> > Date: Mon, 22 Jan 2024 14:58:16 +1100 > Subject: nfsd: fix RELEASE_LOCKOWNER > > From: NeilBrown <neilb@xxxxxxx> > > commit edcf9725150e42beeca42d085149f4c88fa97afd upstream. > > The test on so_count in nfsd4_release_lockowner() is nonsense and > harmful. Revert to using check_for_locks(), changing that to not sleep. > > First: harmful. > As is documented in the kdoc comment for nfsd4_release_lockowner(), the > test on so_count can transiently return a false positive resulting in a > return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is > clearly a protocol violation and with the Linux NFS client it can cause > incorrect behaviour. > > If RELEASE_LOCKOWNER is sent while some other thread is still > processing a LOCK request which failed because, at the time that request > was received, the given owner held a conflicting lock, then the nfsd > thread processing that LOCK request can hold a reference (conflock) to > the lock owner that causes nfsd4_release_lockowner() to return an > incorrect error. > > The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it > never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so > it knows that the error is impossible. It assumes the lock owner was in > fact released so it feels free to use the same lock owner identifier in > some later locking request. > > When it does reuse a lock owner identifier for which a previous RELEASE > failed, it will naturally use a lock_seqid of zero. However the server, > which didn't release the lock owner, will expect a larger lock_seqid and > so will respond with NFS4ERR_BAD_SEQID. > > So clearly it is harmful to allow a false positive, which testing > so_count allows. > > The test is nonsense because ... well... it doesn't mean anything. > > so_count is the sum of three different counts. > 1/ the set of states listed on so_stateids > 2/ the set of active vfs locks owned by any of those states > 3/ various transient counts such as for conflicting locks. > > When it is tested against '2' it is clear that one of these is the > transient reference obtained by find_lockowner_str_locked(). It is not > clear what the other one is expected to be. > > In practice, the count is often 2 because there is precisely one state > on so_stateids. If there were more, this would fail. > > In my testing I see two circumstances when RELEASE_LOCKOWNER is called. > In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in > all the lock states being removed, and so the lockowner being discarded > (it is removed when there are no more references which usually happens > when the lock state is discarded). When nfsd4_release_lockowner() finds > that the lock owner doesn't exist, it returns success. > > The other case shows an so_count of '2' and precisely one state listed > in so_stateid. It appears that the Linux client uses a separate lock > owner for each file resulting in one lock state per lock owner, so this > test on '2' is safe. For another client it might not be safe. > > So this patch changes check_for_locks() to use the (newish) > find_any_file_locked() so that it doesn't take a reference on the > nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With > this check is it safe to restore the use of check_for_locks() rather > than testing so_count against the mysterious '2'. > > Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()") > Signed-off-by: NeilBrown <neilb@xxxxxxx> > Reviewed-by: Jeff Layton <jlayton@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx # v6.2+ > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > --- > fs/nfsd/nfs4state.c | 26 +++++++++++++++----------- > 1 file changed, 15 insertions(+), 11 deletions(-) > > --- a/fs/nfsd/nfs4state.c > +++ b/fs/nfsd/nfs4state.c > @@ -7736,14 +7736,16 @@ check_for_locks(struct nfs4_file *fp, st > { > struct file_lock *fl; > int status = false; > - struct nfsd_file *nf = find_any_file(fp); > + struct nfsd_file *nf; > struct inode *inode; > struct file_lock_context *flctx; > > + spin_lock(&fp->fi_lock); > + nf = find_any_file_locked(fp); > if (!nf) { > /* Any valid lock stateid should have some sort of access */ > WARN_ON_ONCE(1); > - return status; > + goto out; > } > > inode = locks_inode(nf->nf_file); > @@ -7759,7 +7761,8 @@ check_for_locks(struct nfs4_file *fp, st > } > spin_unlock(&flctx->flc_lock); > } > - nfsd_file_put(nf); > +out: > + spin_unlock(&fp->fi_lock); > return status; > } > > @@ -7769,10 +7772,8 @@ check_for_locks(struct nfs4_file *fp, st > * @cstate: NFSv4 COMPOUND state > * @u: RELEASE_LOCKOWNER arguments > * > - * The lockowner's so_count is bumped when a lock record is added > - * or when copying a conflicting lock. The latter case is brief, > - * but can lead to fleeting false positives when looking for > - * locks-in-use. > + * Check if theree are any locks still held and if not - free the lockowner > + * and any lock state that is owned. > * > * Return values: > * %nfs_ok: lockowner released or not found > @@ -7808,10 +7809,13 @@ nfsd4_release_lockowner(struct svc_rqst > spin_unlock(&clp->cl_lock); > return nfs_ok; > } > - if (atomic_read(&lo->lo_owner.so_count) != 2) { > - spin_unlock(&clp->cl_lock); > - nfs4_put_stateowner(&lo->lo_owner); > - return nfserr_locks_held; > + > + list_for_each_entry(stp, &lo->lo_owner.so_stateids, st_perstateowner) { > + if (check_for_locks(stp->st_stid.sc_file, lo)) { > + spin_unlock(&clp->cl_lock); > + nfs4_put_stateowner(&lo->lo_owner); > + return nfserr_locks_held; > + } > } > unhash_lockowner_locked(lo); > while (!list_empty(&lo->lo_owner.so_stateids)) { > > > Patches currently in stable-queue which might be from neilb@xxxxxxx are > > queue-6.1/nfsd-fix-release_lockowner.patch -- Chuck Lever