On Sat, Feb 17, 2024 at 01:27:15PM -0800, dai.ngo@xxxxxxxxxx wrote: > > On 2/17/24 1:09 PM, Chuck Lever wrote: > > On Sat, Feb 17, 2024 at 10:00:59AM -0800, Dai Ngo wrote: > > > When the number of granted delegation becomes large, there are some > > > undesire effects happen on the NFS server. Besides the consumption > > > of system resources, the number of entries on the linked lists of the > > > file cache can grow significantly. > > > > > > When this condition happens, the NFS performace grounds to a halt as > > > reported here [1]. > > That was a v5.15.131 kernel. The LRU problems were addressed in > > v6.2. This doesn't seem like a clean rationale for adding this > > reaper behavior in, say, v6.9. > > Do you know what commits in v6.9 fix this problem? The filecache LRU issues were addressed in v6.2 and before. > Regardless, I think when the number of delegations is getting large, > it's good to ask the clients to release unused delegations. The > Linux client keeps unused delegations for about ~90 secs before > returning them. That seems reasonable. No need for the server to ask for them back unless it is under some stress. > > > This patch attempts to alleviate this problem by asking the clients to > > > voluntarily return any unused delegations when the number of delegation > > > reaches 3/4 of the max_delegations by sending OP_CB_RECALL_ANY to all > > > clients that hold delegations. > > I don't have a strong sense of how big max_delegations can get. Is > > there evidence that CB_RECALL_ANY has enough impact, reliably, to > > reduce the size of the filecache? > > I don't have any numbers for this. But in my testing, the Linux client > returns unused delegations immediately when receiving the CB_RECALL_ANY > instead of keeping them ~90 secs. The issue is whether returning delegations in bulk has any impact on filecache behavior. I'd like to see some evidence of that, because after v6.2 I think NFSD's filecache shouldn't be perturbed by lots of open files. > > More below. > > > > > > > [1] https://lore.kernel.org/all/PH0PR14MB5493F59229B802B871407F9CAA442@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx > > > > > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> > > > --- > > > fs/nfsd/nfs4state.c | 9 +++++++++ > > > 1 file changed, 9 insertions(+) > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > index fdc95bfbfbb6..5fb83853533f 100644 > > > --- a/fs/nfsd/nfs4state.c > > > +++ b/fs/nfsd/nfs4state.c > > > @@ -130,6 +130,7 @@ static const struct nfsd4_callback_ops nfsd4_cb_notify_lock_ops; > > > static const struct nfsd4_callback_ops nfsd4_cb_getattr_ops; > > > static struct workqueue_struct *laundry_wq; > > > +static void deleg_reaper(struct nfsd_net *nn); > > > int nfsd4_create_laundry_wq(void) > > > { > > > @@ -696,6 +697,9 @@ static struct nfsd_file *find_any_file_locked(struct nfs4_file *f) > > > static atomic_long_t num_delegations; > > > unsigned long max_delegations; > > > +/* threshold to trigger deleg_reaper */ > > > +static unsigned long delegations_soft_limit; > > > + > > > /* > > > * Open owner state (share locks) > > > */ > > > @@ -6466,6 +6470,7 @@ nfs4_laundromat(struct nfsd_net *nn) > > > struct nfs4_cpntf_state *cps; > > > copy_stateid_t *cps_t; > > > int i; > > > + long n; > > > if (clients_still_reclaiming(nn)) { > > > lt.new_timeo = 0; > > > @@ -6550,6 +6555,9 @@ nfs4_laundromat(struct nfsd_net *nn) > > > /* service the server-to-server copy delayed unmount list */ > > > nfsd4_ssc_expire_umount(nn); > > > #endif > > > + n = atomic_long_inc_return(&num_delegations); > > I don't think you want to modify the number of delegations here. > > atomic_long_read() instead? > > Right, it's a typo, it should be a read. > > > > > > > > + if (n > delegations_soft_limit) > > This introduces a mixed-sign comparison: n is a long, but > > delegations_soft_limit is an unsigned long. I'm always suspicious > > about whether an atomic counter can underflow, and then we have > > real problems when there are mixed-sign comparisons. > > Yes, n should be unsigned long. > > > > > But I'm also wondering if, instead, this logic should look directly > > at the length of the filecache LRU. > > Is there a quick check; counter? otherwise if we have to walk the > list then it just compounds the problem. Yep: list_lru_count(&nfsd_file_lru) That doesn't really tell us whether those LRU items are pinned by a delegation, though, hm. Just thinking out loud. > > > + deleg_reaper(nn); > > > out: > > > return max_t(time64_t, lt.new_timeo, NFSD_LAUNDROMAT_MINTIMEOUT); > > > } > > > @@ -8482,6 +8490,7 @@ set_max_delegations(void) > > > * giving a worst case usage of about 6% of memory. > > > */ > > > max_delegations = nr_free_buffer_pages() >> (20 - 2 - PAGE_SHIFT); > > > + delegations_soft_limit = (max_delegations / 4) * 3; > > I don't see a strong reason to keep delegations_soft_limit as a > > separate variable. > > Just to save some CPU cycles for not to recompute the soft limit every time. > However, since this is done in the laundromat then it probably not that > critical. Also recompute the soft limit will work correctly if the max_delegations > is changed dynamically. > > -Dai > > > > > > > > } > > > static int nfs4_state_create_net(struct net *net) > > > -- > > > 2.39.3 > > > -- Chuck Lever