Re: [PATCH 1/1] NFSD: send OP_CB_RECALL_ANY to clients when number of delegations reach a threshold

Chuck Lever <chuck.lever@xxxxxxxxxx> · Sat, 17 Feb 2024 16:41:58 -0500

On Sat, Feb 17, 2024 at 01:27:15PM -0800, dai.ngo@xxxxxxxxxx wrote:
> 
> On 2/17/24 1:09 PM, Chuck Lever wrote:
> > On Sat, Feb 17, 2024 at 10:00:59AM -0800, Dai Ngo wrote:
> > > When the number of granted delegation becomes large, there are some
> > > undesire effects happen on the NFS server. Besides the consumption
> > > of system resources, the number of entries on the linked lists of the
> > > file cache can grow significantly.
> > > 
> > > When this condition happens, the NFS performace grounds to a halt as
> > > reported here [1].
> > That was a v5.15.131 kernel. The LRU problems were addressed in
> > v6.2. This doesn't seem like a clean rationale for adding this
> > reaper behavior in, say, v6.9.
> 
> Do you know what commits in v6.9 fix this problem?

The filecache LRU issues were addressed in v6.2 and before.

> Regardless, I think when the number of delegations is getting large,
> it's good to ask the clients to release unused delegations. The
> Linux client keeps unused delegations for about ~90 secs before
> returning them.

That seems reasonable. No need for the server to ask for them back
unless it is under some stress.

> > > This patch attempts to alleviate this problem by asking the clients to
> > > voluntarily return any unused delegations when the number of delegation
> > > reaches 3/4 of the max_delegations by sending OP_CB_RECALL_ANY to all
> > > clients that hold delegations.
> > I don't have a strong sense of how big max_delegations can get. Is
> > there evidence that CB_RECALL_ANY has enough impact, reliably, to
> > reduce the size of the filecache?
> 
> I don't have any numbers for this. But in my testing, the Linux client
> returns unused delegations immediately when receiving the CB_RECALL_ANY
> instead of keeping them ~90 secs.

The issue is whether returning delegations in bulk has any impact on
filecache behavior. I'd like to see some evidence of that, because
after v6.2 I think NFSD's filecache shouldn't be perturbed by lots of
open files.

> > More below.
> > 
> > 
> > > [1] https://lore.kernel.org/all/PH0PR14MB5493F59229B802B871407F9CAA442@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
> > > 
> > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
> > > ---
> > >   fs/nfsd/nfs4state.c | 9 +++++++++
> > >   1 file changed, 9 insertions(+)
> > > 
> > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
> > > index fdc95bfbfbb6..5fb83853533f 100644
> > > --- a/fs/nfsd/nfs4state.c
> > > +++ b/fs/nfsd/nfs4state.c
> > > @@ -130,6 +130,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops;
> > >   static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops;
> > >   static struct workqueue_struct *laundry_wq;
> > > +static void deleg_reaper(struct nfsd_net *nn);
> > >   int nfsd4_create_laundry_wq(void)
> > >   {
> > > @@ -696,6 +697,9 @@ static struct nfsd_file *find_any_file_locked(struct nfs4_file *f)
> > >   static atomic_long_t num_delegations;
> > >   unsigned long max_delegations;
> > > +/* threshold to trigger deleg_reaper */
> > > +static unsigned long delegations_soft_limit;
> > > +
> > >   /*
> > >    * Open owner state (share locks)
> > >    */
> > > @@ -6466,6 +6470,7 @@ nfs4_laundromat(struct nfsd_net *nn)
> > >   	struct nfs4_cpntf_state *cps;
> > >   	copy_stateid_t *cps_t;
> > >   	int i;
> > > +	long n;
> > >   	if (clients_still_reclaiming(nn)) {
> > >   		lt.new_timeo = 0;
> > > @@ -6550,6 +6555,9 @@ nfs4_laundromat(struct nfsd_net *nn)
> > >   	/* service the server-to-server copy delayed unmount list */
> > >   	nfsd4_ssc_expire_umount(nn);
> > >   #endif
> > > +	n = atomic_long_inc_return(&num_delegations);
> > I don't think you want to modify the number of delegations here.
> > atomic_long_read() instead?
> 
> Right, it's a typo, it should be a read.
> 
> > 
> > 
> > > +	if (n > delegations_soft_limit)
> > This introduces a mixed-sign comparison: n is a long, but
> > delegations_soft_limit is an unsigned long. I'm always suspicious
> > about whether an atomic counter can underflow, and then we have
> > real problems when there are mixed-sign comparisons.
> 
> Yes, n should be unsigned long.
> 
> > 
> > But I'm also wondering if, instead, this logic should look directly
> > at the length of the filecache LRU.
> 
> Is there a quick check; counter? otherwise if we have to walk the
> list then it just compounds the problem.

Yep:

	list_lru_count(&nfsd_file_lru)

That doesn't really tell us whether those LRU items are pinned by a
delegation, though, hm. Just thinking out loud.

> > > +		deleg_reaper(nn);
> > >   out:
> > >   	return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT);
> > >   }
> > > @@ -8482,6 +8490,7 @@ set_max_delegations(void)
> > >   	 * giving a worst case usage of about 6% of memory.
> > >   	 */
> > >   	max_delegations = nr_free_buffer_pages() >> (20 - 2 - PAGE_SHIFT);
> > > +	delegations_soft_limit = (max_delegations / 4) * 3;
> > I don't see a strong reason to keep delegations_soft_limit as a
> > separate variable.
> 
> Just to save some CPU cycles for not to recompute the soft limit every time.
> However, since this is done in the laundromat then it probably not that
> critical. Also recompute the soft limit will work correctly if the max_delegations
> is changed dynamically.
> 
> -Dai
> 
> > 
> > 
> > >   }
> > >   static int nfs4_state_create_net(struct net *net)
> > > -- 
> > > 2.39.3
> > > 

-- 
Chuck Lever