> On May 23, 2022, at 11:26 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > On Mon, 2022-05-23 at 15:00 +0000, Chuck Lever III wrote: >> >>> On May 23, 2022, at 9:40 AM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>> >>> On Sun, 2022-05-22 at 11:38 -0400, Chuck Lever wrote: >>>> nfsd4_release_lockowner() holds clp->cl_lock when it calls >>>> check_for_locks(). However, check_for_locks() calls nfsd_file_get() >>>> / nfsd_file_put() to access the backing inode's flc_posix list, and >>>> nfsd_file_put() can sleep if the inode was recently removed. >>>> >>> >>> It might be good to add a might_sleep() to nfsd_file_put? >> >> I intend to include the patch you reviewed last week that >> adds the might_sleep(), as part of this series. >> >> >>>> Let's instead rely on the stateowner's reference count to gate >>>> whether the release is permitted. This should be a reliable >>>> indication of locks-in-use since file lock operations and >>>> ->lm_get_owner take appropriate references, which are released >>>> appropriately when file locks are removed. >>>> >>>> Reported-by: Dai Ngo <dai.ngo@xxxxxxxxxx> >>>> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> >>>> Cc: stable@xxxxxxxxxxxxxxx >>>> --- >>>> fs/nfsd/nfs4state.c | 9 +++------ >>>> 1 file changed, 3 insertions(+), 6 deletions(-) >>>> >>>> This might be a naive approach, but let's start with it. >>>> >>>> This passes light testing, but it's not clear how much our existing >>>> fleet of tests exercises this area. I've locally built a couple of >>>> pynfs tests (one is based on the one Dai posted last week) and they >>>> pass too. >>>> >>>> I don't believe that FREE_STATEID needs the same simplification. >>>> >>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c >>>> index a280256cbb03..b77894e668a4 100644 >>>> --- a/fs/nfsd/nfs4state.c >>>> +++ b/fs/nfsd/nfs4state.c >>>> @@ -7559,12 +7559,9 @@ nfsd4_release_lockowner(struct svc_rqst *rqstp, >>>> >>>> /* see if there are still any locks associated with it */ >>>> lo = lockowner(sop); >>>> - list_for_each_entry(stp, &sop->so_stateids, st_perstateowner) { >>>> - if (check_for_locks(stp->st_stid.sc_file, lo)) { >>>> - status = nfserr_locks_held; >>>> - spin_unlock(&clp->cl_lock); >>>> - return status; >>>> - } >>>> + if (atomic_read(&sop->so_count) > 1) { >>>> + spin_unlock(&clp->cl_lock); >>>> + return nfserr_locks_held; >>>> } >>>> >>>> nfs4_get_stateowner(sop); >>>> >>>> >>> >>> lm_get_owner is called from locks_copy_conflock, so if someone else >>> happens to be doing a LOCKT or F_GETLK call at the same time that >>> RELEASE_LOCKOWNER gets called, then this may end up returning an error >>> inappropriately. >> >> IMO releasing the lockowner while it's being used for _anything_ >> seems risky and surprising. If RELEASE_LOCKOWNER succeeds while >> the client is still using the lockowner for any reason, a >> subsequent error will occur if the client tries to use it again. >> Heck, I can see the server failing in mid-COMPOUND with this kind >> of race. Better I think to just leave the lockowner in place if >> there's any ambiguity. >> > > The problem here is not the client itself calling RELEASE_LOCKOWNER > while it's still in use, but rather a different client altogether > calling LOCKT (or a local process does a F_GETLK) on an inode where a > lock is held by a client. The LOCKT gets a reference to it (for the > conflock), while the client that has the lockowner releases the lock and > then the lockowner while the refcount is still high. > > The race window for this is probably quite small, but I think it's > theoretically possible. The point is that an elevated refcount on the > lockowner doesn't necessarily mean that locks are actually being held by > it. Sure, I get that the lockowner's reference count is not 100% reliable. The question is whether it's good enough. We are looking for a mechanism that can simply count the number of locks held by a lockowner. It sounds like you believe that lm_get_owner / put_owner might not be a reliable way to do that. >> The spec language does not say RELEASE_LOCKOWNER must not return >> LOCKS_HELD for other reasons, and it does say that there is no >> choice of using another NFSERR value (RFC 7530 Section 13.2). >> > > What recourse does the client have if this happens? It released all of > its locks and tried to release the lockowner, but the server says "locks > held". Should it just give up at that point? RELEASE_LOCKOWNER is a sort > of a courtesy by the client, I suppose... RELEASE_LOCKOWNER is a courtesy for the server. Most clients ignore the return code IIUC. So the hazard caused by this race would be a small resource leak on the server that would go away once the client's lease was purged. >>> My guess is that that would be pretty hard to hit the >>> timing right, but not impossible. >>> >>> What we may want to do is have the kernel do this check and only if it >>> comes back >1 do the actual check for locks. That won't fix the original >>> problem though. >>> >>> In other places in nfsd, we've plumbed in a dispose_list head and >>> deferred the sleeping functions until the spinlock can be dropped. I >>> haven't looked closely at whether that's possible here, but it may be a >>> more reliable approach. >> >> That was proposed by Dai last week. >> >> https://lore.kernel.org/linux-nfs/1653079929-18283-1-git-send-email-dai.ngo@xxxxxxxxxx/T/#u >> >> Trond pointed out that if two separate clients were releasing a >> lockowner on the same inode, there is nothing that protects the >> dispose_list, and it would get corrupted. >> >> https://lore.kernel.org/linux-nfs/31E87CEF-C83D-4FA8-A774-F2C389011FCE@xxxxxxxxxx/T/#mf1fc1ae0503815c0a36ae75a95086c3eff892614 >> > > Yeah, that doesn't look like what's needed. > > What I was going to suggest is a nfsd_file_put variant that takes a > list_head. If the refcount goes to zero and the thing ends up being > unhashed, then you put it on the dispose list rather than doing the > blocking operations, and then clean it up later. Trond doesn't like that approach; see the e-mail thread. > That said, nfsd_file_put has grown significantly in complexity over the > years, so maybe that's not simple to do now. > -- > Jeff Layton <jlayton@xxxxxxxxxx> -- Chuck Lever