Re: [PATCH 1/1] NFSD: fix WARN_ON_ONCE in __queue_delayed_work

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jan 10, 2023, at 1:46 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
> 
> 
> On 1/10/23 10:17 AM, Chuck Lever III wrote:
>> 
>>> On Jan 10, 2023, at 12:33 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:
>>> 
>>> 
>>> On 1/10/23 2:30 AM, Jeff Layton wrote:
>>>> On Mon, 2023-01-09 at 22:48 -0800, Dai Ngo wrote:
>>>>> Currently nfsd4_state_shrinker_worker can be schduled multiple times
>>>>> from nfsd4_state_shrinker_count when memory is low. This causes
>>>>> the WARN_ON_ONCE in __queue_delayed_work to trigger.
>>>>> 
>>>>> This patch allows only one instance of nfsd4_state_shrinker_worker
>>>>> at a time using the nfsd_shrinker_active flag, protected by the
>>>>> client_lock.
>>>>> 
>>>>> Replace mod_delayed_work with queue_delayed_work since we
>>>>> don't expect to modify the delay of any pending work.
>>>>> 
>>>>> Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition")
>>>>> Reported-by: Mike Galbraith <efault@xxxxxx>
>>>>> Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
>>>>> ---
>>>>>  fs/nfsd/netns.h     |  1 +
>>>>>  fs/nfsd/nfs4state.c | 16 ++++++++++++++--
>>>>>  2 files changed, 15 insertions(+), 2 deletions(-)
>>>>> 
>>>>> diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h
>>>>> index 8c854ba3285b..801d70926442 100644
>>>>> --- a/fs/nfsd/netns.h
>>>>> +++ b/fs/nfsd/netns.h
>>>>> @@ -196,6 +196,7 @@ struct nfsd_net {
>>>>>  	atomic_t		nfsd_courtesy_clients;
>>>>>  	struct shrinker		nfsd_client_shrinker;
>>>>>  	struct delayed_work	nfsd_shrinker_work;
>>>>> +	bool			nfsd_shrinker_active;
>>>>>  };
>>>>>    /* Simple check to find out if a given net was properly initialized */
>>>>> diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
>>>>> index ee56c9466304..e00551af6a11 100644
>>>>> --- a/fs/nfsd/nfs4state.c
>>>>> +++ b/fs/nfsd/nfs4state.c
>>>>> @@ -4407,11 +4407,20 @@ nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc)
>>>>>  	struct nfsd_net *nn = container_of(shrink,
>>>>>  			struct nfsd_net, nfsd_client_shrinker);
>>>>>  +	spin_lock(&nn->client_lock);
>>>>> +	if (nn->nfsd_shrinker_active) {
>>>>> +		spin_unlock(&nn->client_lock);
>>>>> +		return 0;
>>>>> +	}
>>>> Is this extra machinery really necessary? The bool and spinlock don't
>>>> seem to be needed. Typically there is no issue with calling
>>>> queued_delayed_work when the work is already queued. It just returns
>>>> false in that case without doing anything.
>>> When there are multiple calls to mod_delayed_work/queue_delayed_work
>>> we hit the WARN_ON_ONCE's in __queue_delayed_work and __queue_work if
>>> the work is queued but not execute yet.
>> The delay argument of zero is interesting. If it's set to a value
>> greater than zero, do you still see a problem?
> 
> I tried and tried but could not reproduce the problem that Mike
> reported. I guess my VMs don't have fast enough cpus to make it
> happen.

I'd prefer not to guess... it sounds like we don't have a clear
root cause on this one yet.

I think I agree with Jeff: a spinlock shouldn't be required to
make queuing work safe via this API.


> As Jeff mentioned, delay 0 should be safe and we want to run
> the shrinker as soon as possible when memory is low.

I suggested that because the !delay code paths seem to lead
directly to the WARN_ONs in queue_work(). <shrug>


> -Dai
> 
>> 
>> 
>>> This problem was reported by Mike. I initially tried with only the
>>> bool but that was not enough that was why the spinlock was added.
>>> Mike verified that the patch fixed the problem.
>>> 
>>> -Dai
>>> 
>>>>>  	count = atomic_read(&nn->nfsd_courtesy_clients);
>>>>>  	if (!count)
>>>>>  		count = atomic_long_read(&num_delegations);
>>>>> -	if (count)
>>>>> -		mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0);
>>>>> +	if (count) {
>>>>> +		nn->nfsd_shrinker_active = true;
>>>>> +		spin_unlock(&nn->client_lock);
>>>>> +		queue_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0);
>>>>> +	} else
>>>>> +		spin_unlock(&nn->client_lock);
>>>>>  	return (unsigned long)count;
>>>>>  }
>>>>>  @@ -6239,6 +6248,9 @@ nfsd4_state_shrinker_worker(struct work_struct *work)
>>>>>    	courtesy_client_reaper(nn);
>>>>>  	deleg_reaper(nn);
>>>>> +	spin_lock(&nn->client_lock);
>>>>> +	nn->nfsd_shrinker_active = 0;
>>>>> +	spin_unlock(&nn->client_lock);
>>>>>  }
>>>>>    static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp)
>> --
>> Chuck Lever

--
Chuck Lever







[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux