> On Jan 22, 2024, at 7:18 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Tue, 23 Jan 2024, Chuck Lever III wrote: >> >> >>> On Jan 22, 2024, at 6:31 PM, NeilBrown <neilb@xxxxxxx> wrote: >>> >>> On Tue, 23 Jan 2024, Chuck Lever III wrote: >>>> >>>> >>>>> On Jan 22, 2024, at 4:57 PM, NeilBrown <neilb@xxxxxxx> wrote: >>>>> >>>>> On Tue, 23 Jan 2024, Chuck Lever wrote: >>>>>> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote: >>>>>>> >>>>>>> The test on so_count in nfsd4_release_lockowner() is nonsense and >>>>>>> harmful. Revert to using check_for_locks(), changing that to not sleep. >>>>>>> >>>>>>> First: harmful. >>>>>>> As is documented in the kdoc comment for nfsd4_release_lockowner(), the >>>>>>> test on so_count can transiently return a false positive resulting in a >>>>>>> return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is >>>>>>> clearly a protocol violation and with the Linux NFS client it can cause >>>>>>> incorrect behaviour. >>>>>>> >>>>>>> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still >>>>>>> processing a LOCK request which failed because, at the time that request >>>>>>> was received, the given owner held a conflicting lock, then the nfsd >>>>>>> thread processing that LOCK request can hold a reference (conflock) to >>>>>>> the lock owner that causes nfsd4_release_lockowner() to return an >>>>>>> incorrect error. >>>>>>> >>>>>>> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it >>>>>>> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so >>>>>>> it knows that the error is impossible. It assumes the lock owner was in >>>>>>> fact released so it feels free to use the same lock owner identifier in >>>>>>> some later locking request. >>>>>>> >>>>>>> When it does reuse a lock owner identifier for which a previous RELEASE >>>>>>> failed, it will naturally use a lock_seqid of zero. However the server, >>>>>>> which didn't release the lock owner, will expect a larger lock_seqid and >>>>>>> so will respond with NFS4ERR_BAD_SEQID. >>>>>>> >>>>>>> So clearly it is harmful to allow a false positive, which testing >>>>>>> so_count allows. >>>>>>> >>>>>>> The test is nonsense because ... well... it doesn't mean anything. >>>>>>> >>>>>>> so_count is the sum of three different counts. >>>>>>> 1/ the set of states listed on so_stateids >>>>>>> 2/ the set of active vfs locks owned by any of those states >>>>>>> 3/ various transient counts such as for conflicting locks. >>>>>>> >>>>>>> When it is tested against '2' it is clear that one of these is the >>>>>>> transient reference obtained by find_lockowner_str_locked(). It is not >>>>>>> clear what the other one is expected to be. >>>>>>> >>>>>>> In practice, the count is often 2 because there is precisely one state >>>>>>> on so_stateids. If there were more, this would fail. >>>>>>> >>>>>>> It my testing I see two circumstances when RELEASE_LOCKOWNER is called. >>>>>>> In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in >>>>>>> all the lock states being removed, and so the lockowner being discarded >>>>>>> (it is removed when there are no more references which usually happens >>>>>>> when the lock state is discarded). When nfsd4_release_lockowner() finds >>>>>>> that the lock owner doesn't exist, it returns success. >>>>>>> >>>>>>> The other case shows an so_count of '2' and precisely one state listed >>>>>>> in so_stateid. It appears that the Linux client uses a separate lock >>>>>>> owner for each file resulting in one lock state per lock owner, so this >>>>>>> test on '2' is safe. For another client it might not be safe. >>>>>>> >>>>>>> So this patch changes check_for_locks() to use the (newish) >>>>>>> find_any_file_locked() so that it doesn't take a reference on the >>>>>>> nfs4_file and so never calls nfsd_file_put(), and so never sleeps. >>>>>> >>>>>> More to the point, find_any_file_locked() was added by commit >>>>>> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states >>>>>> seqfile display"), which was merged several months /after/ commit >>>>>> ce3c4ad7f4ce ("NFSD: Fix possible sleep during >>>>>> nfsd4_release_lockowner()"). >>>>> >>>>> Yes. To flesh out the history: >>>>> nfsd_file_put() was added in v5.4. In earlier kernels check_for_locks() >>>>> would never sleep. However the problem patch was backported 4.9, 4.14, >>>>> and 4.19 and should be reverted. >>>> >>>> I don't see "NFSD: Fix possible sleep during nfsd4_release_lockowner()" >>>> in any of those kernels. All but 4.19 are now EOL. >>> >>> I hadn't checked which were EOL. Thanks for finding the 4.19 patch and >>> requesting a revert. >>> >>>> >>>> >>>>> find_any_file_locked() was added in v6.2 so when this patch is >>>>> backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include >>>>> find_and_file_locked() >>>> >>>> I think I'd rather leave those unperturbed until someone hits a real >>>> problem. Unless you have a distribution kernel that needs to see >>>> this fix in one of the LTS kernels? The supported stable/LTS kernels >>>> are 5.4, 5.10, 5.15, and 6.1. >>> >>> Why not fix the bug? It's a real bug that a real customer really hit. >> >> That's what I'm asking. Was there a real customer issue? Because >> ce3c4ad7f4ce was the result of a might_sleep splat, not an issue >> in the field. >> >> Since this hit a real user, is there a BugLink: or Closes: tag >> I can add? > > Unfortunately not. Our bugs for paying customers are private - so they > can feel comfortable providing all the details we need. > The link is https://bugzilla.suse.com/show_bug.cgi?id=1218968 > that won't help anyone outside SUSE. Well, that link does show that there's a real bug report behind the fix. That part wasn't clear from your lengthy patch description, which I otherwise appreciated. > But definitely a real bug. Understood. That makes it a priority to backport. >>> I've fixed it in all SLE kernels - we don't depend on LTS though we do >>> make use of the stable trees in various ways. >>> >>> But it's your call. >> >> Well, it's not my call whether it's backported. Anyone can ask. >> It /is/ my call if I do the work. > > That's fair. I'm happy to create the manual backport patch. I'm not > going to test those kernels. Testing is the resource-bound part, fwiw. > I generally assume that someone is testing > stable kernels - so obvious problems will be found by someone else, and > non-obvious problems I wouldn't find even if I did some testing :-( Stable kernels don't get any specific testing of subsystem code, unless we do it ourselves. I will try to get to it soon. > Thanks, > NeilBrown > > >> >> Mostly the issue is having the time to manage 5-6 stable >> kernels as well as upstream development. I'd like to ensure >> that any patches for which we manually request a backport have >> been applied and tested by someone with a real NFS rig, and >> there is a real problem getting addressed. >> >> It's getting better with kdevops... I can cherry pick any >> recent kernel and run it through a number of tests without >> needing to hand-hold. Haven't tried that with the older stable >> kernels, though; those require older compilers and what-not. >> >> >> -- >> Chuck Lever -- Chuck Lever