On Tue, 2023-01-10 at 18:53 +0000, Chuck Lever III wrote: > > > On Jan 10, 2023, at 1:46 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > > > > > On 1/10/23 10:17 AM, Chuck Lever III wrote: > > > > > > > On Jan 10, 2023, at 12:33 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote: > > > > > > > > > > > > On 1/10/23 2:30 AM, Jeff Layton wrote: > > > > > On Mon, 2023-01-09 at 22:48 -0800, Dai Ngo wrote: > > > > > > Currently nfsd4_state_shrinker_worker can be schduled multiple times > > > > > > from nfsd4_state_shrinker_count when memory is low. This causes > > > > > > the WARN_ON_ONCE in __queue_delayed_work to trigger. > > > > > > > > > > > > This patch allows only one instance of nfsd4_state_shrinker_worker > > > > > > at a time using the nfsd_shrinker_active flag, protected by the > > > > > > client_lock. > > > > > > > > > > > > Replace mod_delayed_work with queue_delayed_work since we > > > > > > don't expect to modify the delay of any pending work. > > > > > > > > > > > > Fixes: 44df6f439a17 ("NFSD: add delegation reaper to react to low memory condition") > > > > > > Reported-by: Mike Galbraith <efault@xxxxxx> > > > > > > Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx> > > > > > > --- > > > > > > fs/nfsd/netns.h | 1 + > > > > > > fs/nfsd/nfs4state.c | 16 ++++++++++++++-- > > > > > > 2 files changed, 15 insertions(+), 2 deletions(-) > > > > > > > > > > > > diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h > > > > > > index 8c854ba3285b..801d70926442 100644 > > > > > > --- a/fs/nfsd/netns.h > > > > > > +++ b/fs/nfsd/netns.h > > > > > > @@ -196,6 +196,7 @@ struct nfsd_net { > > > > > > atomic_t nfsd_courtesy_clients; > > > > > > struct shrinker nfsd_client_shrinker; > > > > > > struct delayed_work nfsd_shrinker_work; > > > > > > + bool nfsd_shrinker_active; > > > > > > }; > > > > > > /* Simple check to find out if a given net was properly initialized */ > > > > > > diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c > > > > > > index ee56c9466304..e00551af6a11 100644 > > > > > > --- a/fs/nfsd/nfs4state.c > > > > > > +++ b/fs/nfsd/nfs4state.c > > > > > > @@ -4407,11 +4407,20 @@ nfsd4_state_shrinker_count(struct shrinker *shrink, struct shrink_control *sc) > > > > > > struct nfsd_net *nn = container_of(shrink, > > > > > > struct nfsd_net, nfsd_client_shrinker); > > > > > > + spin_lock(&nn->client_lock); > > > > > > + if (nn->nfsd_shrinker_active) { > > > > > > + spin_unlock(&nn->client_lock); > > > > > > + return 0; > > > > > > + } > > > > > Is this extra machinery really necessary? The bool and spinlock don't > > > > > seem to be needed. Typically there is no issue with calling > > > > > queued_delayed_work when the work is already queued. It just returns > > > > > false in that case without doing anything. > > > > When there are multiple calls to mod_delayed_work/queue_delayed_work > > > > we hit the WARN_ON_ONCE's in __queue_delayed_work and __queue_work if > > > > the work is queued but not execute yet. > > > The delay argument of zero is interesting. If it's set to a value > > > greater than zero, do you still see a problem? > > > > I tried and tried but could not reproduce the problem that Mike > > reported. I guess my VMs don't have fast enough cpus to make it > > happen. > > I'd prefer not to guess... it sounds like we don't have a clear > root cause on this one yet. > > I think I agree with Jeff: a spinlock shouldn't be required to > make queuing work safe via this API. > > > > As Jeff mentioned, delay 0 should be safe and we want to run > > the shrinker as soon as possible when memory is low. > > I suggested that because the !delay code paths seem to lead > directly to the WARN_ONs in queue_work(). <shrug> > > One of the WARNs in that Mike hit was this: WARN_ON_ONCE(timer->function != delayed_work_timer_fn); nfsd isn't doing anything exotic with that function pointer, so that really looks like something got corrupted. Given that this is happening under low-memory conditions, then I have to wonder if we're just ending up with a workqueue job that remained on the queue after the nfsd_net got freed and recycled. I'd start with a patch like this (note, untested): diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c index 2f4a2449b314..86da6663806e 100644 --- a/fs/nfsd/nfs4state.c +++ b/fs/nfsd/nfs4state.c @@ -8158,6 +8158,7 @@ nfs4_state_shutdown_net(struct net *net) struct nfsd_net *nn = net_generic(net, nfsd_net_id); unregister_shrinker(&nn->nfsd_client_shrinker); + cancel_delayed_work_sync(&nn->nfsd_shrinker_work); cancel_delayed_work_sync(&nn->laundromat_work); locks_end_grace(&nn->nfsd4_manager); Either way, I think longer nfsd_shrinker_work ought to be converted to a normal work_struct since you don't ever use the delay. > > -Dai > > > > > > > > > > > > This problem was reported by Mike. I initially tried with only the > > > > bool but that was not enough that was why the spinlock was added. > > > > Mike verified that the patch fixed the problem. > > > > > > > > -Dai > > > > > > > > > > count = atomic_read(&nn->nfsd_courtesy_clients); > > > > > > if (!count) > > > > > > count = atomic_long_read(&num_delegations); > > > > > > - if (count) > > > > > > - mod_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); > > > > > > + if (count) { > > > > > > + nn->nfsd_shrinker_active = true; > > > > > > + spin_unlock(&nn->client_lock); > > > > > > + queue_delayed_work(laundry_wq, &nn->nfsd_shrinker_work, 0); > > > > > > + } else > > > > > > + spin_unlock(&nn->client_lock); > > > > > > return (unsigned long)count; > > > > > > } > > > > > > @@ -6239,6 +6248,9 @@ nfsd4_state_shrinker_worker(struct work_struct *work) > > > > > > courtesy_client_reaper(nn); > > > > > > deleg_reaper(nn); > > > > > > + spin_lock(&nn->client_lock); > > > > > > + nn->nfsd_shrinker_active = 0; > > > > > > + spin_unlock(&nn->client_lock); > > > > > > } > > > > > > static inline __be32 nfs4_check_fh(struct svc_fh *fhp, struct nfs4_stid *stp) > > > -- > > > Chuck Lever > > -- > Chuck Lever > > > -- Jeff Layton <jlayton@xxxxxxxxxx>