Re: [PATCH] nfsd: fix RELEASE_LOCKOWNER

Chuck Lever III <chuck.lever@xxxxxxxxxx> · Tue, 23 Jan 2024 00:52:21 +0000

> On Jan 22, 2024, at 7:18 PM, NeilBrown <neilb@xxxxxxx> wrote:
> 
> On Tue, 23 Jan 2024, Chuck Lever III wrote:
>> 
>> 
>>> On Jan 22, 2024, at 6:31 PM, NeilBrown <neilb@xxxxxxx> wrote:
>>> 
>>> On Tue, 23 Jan 2024, Chuck Lever III wrote:
>>>> 
>>>> 
>>>>> On Jan 22, 2024, at 4:57 PM, NeilBrown <neilb@xxxxxxx> wrote:
>>>>> 
>>>>> On Tue, 23 Jan 2024, Chuck Lever wrote:
>>>>>> On Mon, Jan 22, 2024 at 02:58:16PM +1100, NeilBrown wrote:
>>>>>>> 
>>>>>>> The test on so_count in nfsd4_release_lockowner() is nonsense and
>>>>>>> harmful.  Revert to using check_for_locks(), changing that to not sleep.
>>>>>>> 
>>>>>>> First: harmful.
>>>>>>> As is documented in the kdoc comment for nfsd4_release_lockowner(), the
>>>>>>> test on so_count can transiently return a false positive resulting in a
>>>>>>> return of NFS4ERR_LOCKS_HELD when in fact no locks are held.  This is
>>>>>>> clearly a protocol violation and with the Linux NFS client it can cause
>>>>>>> incorrect behaviour.
>>>>>>> 
>>>>>>> If NFS4_RELEASE_LOCKOWNER is sent while some other thread is still
>>>>>>> processing a LOCK request which failed because, at the time that request
>>>>>>> was received, the given owner held a conflicting lock, then the nfsd
>>>>>>> thread processing that LOCK request can hold a reference (conflock) to
>>>>>>> the lock owner that causes nfsd4_release_lockowner() to return an
>>>>>>> incorrect error.
>>>>>>> 
>>>>>>> The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it
>>>>>>> never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so
>>>>>>> it knows that the error is impossible.  It assumes the lock owner was in
>>>>>>> fact released so it feels free to use the same lock owner identifier in
>>>>>>> some later locking request.
>>>>>>> 
>>>>>>> When it does reuse a lock owner identifier for which a previous RELEASE
>>>>>>> failed, it will naturally use a lock_seqid of zero.  However the server,
>>>>>>> which didn't release the lock owner, will expect a larger lock_seqid and
>>>>>>> so will respond with NFS4ERR_BAD_SEQID.
>>>>>>> 
>>>>>>> So clearly it is harmful to allow a false positive, which testing
>>>>>>> so_count allows.
>>>>>>> 
>>>>>>> The test is nonsense because ... well... it doesn't mean anything.
>>>>>>> 
>>>>>>> so_count is the sum of three different counts.
>>>>>>> 1/ the set of states listed on so_stateids
>>>>>>> 2/ the set of active vfs locks owned by any of those states
>>>>>>> 3/ various transient counts such as for conflicting locks.
>>>>>>> 
>>>>>>> When it is tested against '2' it is clear that one of these is the
>>>>>>> transient reference obtained by find_lockowner_str_locked().  It is not
>>>>>>> clear what the other one is expected to be.
>>>>>>> 
>>>>>>> In practice, the count is often 2 because there is precisely one state
>>>>>>> on so_stateids.  If there were more, this would fail.
>>>>>>> 
>>>>>>> It my testing I see two circumstances when RELEASE_LOCKOWNER is called.
>>>>>>> In one case, CLOSE is called before RELEASE_LOCKOWNER.  That results in
>>>>>>> all the lock states being removed, and so the lockowner being discarded
>>>>>>> (it is removed when there are no more references which usually happens
>>>>>>> when the lock state is discarded).  When nfsd4_release_lockowner() finds
>>>>>>> that the lock owner doesn't exist, it returns success.
>>>>>>> 
>>>>>>> The other case shows an so_count of '2' and precisely one state listed
>>>>>>> in so_stateid.  It appears that the Linux client uses a separate lock
>>>>>>> owner for each file resulting in one lock state per lock owner, so this
>>>>>>> test on '2' is safe.  For another client it might not be safe.
>>>>>>> 
>>>>>>> So this patch changes check_for_locks() to use the (newish)
>>>>>>> find_any_file_locked() so that it doesn't take a reference on the
>>>>>>> nfs4_file and so never calls nfsd_file_put(), and so never sleeps.
>>>>>> 
>>>>>> More to the point, find_any_file_locked() was added by commit
>>>>>> e0aa651068bf ("nfsd: don't call nfsd_file_put from client states
>>>>>> seqfile display"), which was merged several months /after/ commit
>>>>>> ce3c4ad7f4ce ("NFSD: Fix possible sleep during
>>>>>> nfsd4_release_lockowner()").
>>>>> 
>>>>> Yes.  To flesh out the history:
>>>>> nfsd_file_put() was added in v5.4.  In earlier kernels check_for_locks()
>>>>> would never sleep.  However the problem patch was backported 4.9, 4.14,
>>>>> and 4.19 and should be reverted.
>>>> 
>>>> I don't see "NFSD: Fix possible sleep during nfsd4_release_lockowner()"
>>>> in any of those kernels. All but 4.19 are now EOL.
>>> 
>>> I hadn't checked which were EOL.  Thanks for finding the 4.19 patch and
>>> requesting a revert.
>>> 
>>>> 
>>>> 
>>>>> find_any_file_locked() was added in v6.2 so when this patch is
>>>>> backported to 5.4, 5.10, 5.15, 5.17 - 6.1 it needs to include
>>>>> find_and_file_locked()
>>>> 
>>>> I think I'd rather leave those unperturbed until someone hits a real
>>>> problem. Unless you have a distribution kernel that needs to see
>>>> this fix in one of the LTS kernels? The supported stable/LTS kernels
>>>> are 5.4, 5.10, 5.15, and 6.1.
>>> 
>>> Why not fix the bug?  It's a real bug that a real customer really hit.
>> 
>> That's what I'm asking. Was there a real customer issue? Because
>> ce3c4ad7f4ce was the result of a might_sleep splat, not an issue
>> in the field.
>> 
>> Since this hit a real user, is there a BugLink: or Closes: tag
>> I can add?
> 
> Unfortunately not. Our bugs for paying customers are private - so they
> can feel comfortable providing all the details we need.
> The link is https://bugzilla.suse.com/show_bug.cgi?id=1218968
> that won't help anyone outside SUSE.

Well, that link does show that there's a real bug report behind
the fix. That part wasn't clear from your lengthy patch
description, which I otherwise appreciated.

> But definitely a real bug.

Understood. That makes it a priority to backport.

>>> I've fixed it in all SLE kernels - we don't depend on LTS though we do
>>> make use of the stable trees in various ways.
>>> 
>>> But it's your call.
>> 
>> Well, it's not my call whether it's backported. Anyone can ask.
>> It /is/ my call if I do the work.
> 
> That's fair.  I'm happy to create the manual backport patch.  I'm not
> going to test those kernels.

Testing is the resource-bound part, fwiw.

> I generally assume that someone is testing
> stable kernels - so obvious problems will be found by someone else, and
> non-obvious problems I wouldn't find even if I did some testing :-(

Stable kernels don't get any specific testing of subsystem code,
unless we do it ourselves. I will try to get to it soon.

> Thanks,
> NeilBrown
> 
> 
>> 
>> Mostly the issue is having the time to manage 5-6 stable
>> kernels as well as upstream development. I'd like to ensure
>> that any patches for which we manually request a backport have
>> been applied and tested by someone with a real NFS rig, and
>> there is a real problem getting addressed.
>> 
>> It's getting better with kdevops... I can cherry pick any
>> recent kernel and run it through a number of tests without
>> needing to hand-hold. Haven't tried that with the older stable
>> kernels, though; those require older compilers and what-not.
>> 
>> 
>> --
>> Chuck Lever

--
Chuck Lever