On Mon, Feb 10, 2025 at 09:01:41AM -0500, Jeff Layton wrote: > On Fri, 2025-02-07 at 16:15 +1100, NeilBrown wrote: > > The filecache lru is walked in 2 circumstances for 2 different reasons. > > > > 1/ When called from the shrinker we want to discard the first few > > entries on the list, ignoring any with NFSD_FILE_REFERENCED set > > because they should really be at the end of the LRU as they have been > > referenced recently. So those ones are ROTATED. > > > > 2/ When called from the nfsd_file_gc() timer function we want to discard > > anything that hasn't been used since before the previous call, and > > mark everything else as unused at this point in time. > > > > Using the same flag for both of these can result in some unexpected > > outcomes. If the shrinker callback clears NFSD_FILE_REFERENCED then the > > nfsd_file_gc() will think the file hasn't been used in a while, while > > really it has. > > > > I think it is easier to reason about the behaviour if we instead have > > two flags. > > > > NFSD_FILE_REFERENCED means "this should be at the end of the LRU, please > > put it there when convenient" > > NFSD_FILE_RECENT means "this has been used recently - since the last > > run of nfsd_file_gc() > > > > When either caller finds an NFSD_FILE_REFERENCED entry, that entry > > should be moved to the end of the LRU and the flag cleared. This can > > safely happen at any time. The actual order on the lru might not be > > strictly least-recently-used, but that is normal for linux lrus. > > > > The shrinker callback can ignore the "recent" flag. If it ends up > > freeing something that is "recent" that simply means that memory > > pressure is sufficient to limit the acceptable cache age to less than > > the nfsd_file_gc frequency. > > > > The gc caller should primarily focus on NFSD_FILE_RECENT. It should > > free everything that doesn't have this flag set, and should clear the > > flag on everything else. When it clears the flag it is convenient to > > clear the "REFERENCED" flag and move to the end of the LRU too. > > > > With this, calls from the shrinker do not prematurely age files. It > > will focus only on freeing those that are least recently used. > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > --- > > fs/nfsd/filecache.c | 21 +++++++++++++++++++-- > > fs/nfsd/filecache.h | 1 + > > fs/nfsd/trace.h | 3 +++ > > 3 files changed, 23 insertions(+), 2 deletions(-) > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > > index 04588c03bdfe..9faf469354a5 100644 > > --- a/fs/nfsd/filecache.c > > +++ b/fs/nfsd/filecache.c > > @@ -318,10 +318,10 @@ nfsd_file_check_writeback(struct nfsd_file *nf) > > mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); > > } > > > > - > > static bool nfsd_file_lru_add(struct nfsd_file *nf) > > { > > set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); > > + set_bit(NFSD_FILE_RECENT, &nf->nf_flags); > > Technically, I don't think you need the REFERENCED bit at all. This is > the only place it's set, and below this is calling list_lru_add_obj(). > That returns false if the object was already on a per-node LRU. > > Instead of that, you could add a list_lru helper that will rotate the > object to the end of its nodelist if it's already on one. OTOH, that > might mean more cross NUMA-node accesses to the spinlocks than we get > by using a flag and doing this at GC time. No, please don't. Per-object reference bits are required to enable lazy LRU rotation. The LRU lists are -hot- objects; touching them every time we touch an object on the LRU is prohibitively expensive because of exclusive lock/cacheline contention. Hence we defer operations like rotation to a context where we already have the list locked and cached exclusively for some other reason (i.e. memory reclaim). This is the same reason we use lazy removal from LRUs - it avoids LRU list manipulations every time a hot cached object is accessed and/or dropped. IOWs, removing the per-object NFSD_FILE_REFERENCED bit will undo one of the necessary the optimisations that allow hot caches LRU management to work efficiently with minimal overhead. -Dave. -- Dave Chinner david@xxxxxxxxxxxxx