On Tue, 2025-02-11 at 10:57 +1100, Dave Chinner wrote: > On Mon, Feb 10, 2025 at 09:01:41AM -0500, Jeff Layton wrote: > > On Fri, 2025-02-07 at 16:15 +1100, NeilBrown wrote: > > > The filecache lru is walked in 2 circumstances for 2 different reasons. > > > > > > 1/ When called from the shrinker we want to discard the first few > > > entries on the list, ignoring any with NFSD_FILE_REFERENCED set > > > because they should really be at the end of the LRU as they have been > > > referenced recently. So those ones are ROTATED. > > > > > > 2/ When called from the nfsd_file_gc() timer function we want to discard > > > anything that hasn't been used since before the previous call, and > > > mark everything else as unused at this point in time. > > > > > > Using the same flag for both of these can result in some unexpected > > > outcomes. If the shrinker callback clears NFSD_FILE_REFERENCED then the > > > nfsd_file_gc() will think the file hasn't been used in a while, while > > > really it has. > > > > > > I think it is easier to reason about the behaviour if we instead have > > > two flags. > > > > > > NFSD_FILE_REFERENCED means "this should be at the end of the LRU, please > > > put it there when convenient" > > > NFSD_FILE_RECENT means "this has been used recently - since the last > > > run of nfsd_file_gc() > > > > > > When either caller finds an NFSD_FILE_REFERENCED entry, that entry > > > should be moved to the end of the LRU and the flag cleared. This can > > > safely happen at any time. The actual order on the lru might not be > > > strictly least-recently-used, but that is normal for linux lrus. > > > > > > The shrinker callback can ignore the "recent" flag. If it ends up > > > freeing something that is "recent" that simply means that memory > > > pressure is sufficient to limit the acceptable cache age to less than > > > the nfsd_file_gc frequency. > > > > > > The gc caller should primarily focus on NFSD_FILE_RECENT. It should > > > free everything that doesn't have this flag set, and should clear the > > > flag on everything else. When it clears the flag it is convenient to > > > clear the "REFERENCED" flag and move to the end of the LRU too. > > > > > > With this, calls from the shrinker do not prematurely age files. It > > > will focus only on freeing those that are least recently used. > > > > > > Signed-off-by: NeilBrown <neilb@xxxxxxx> > > > --- > > > fs/nfsd/filecache.c | 21 +++++++++++++++++++-- > > > fs/nfsd/filecache.h | 1 + > > > fs/nfsd/trace.h | 3 +++ > > > 3 files changed, 23 insertions(+), 2 deletions(-) > > > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > > > index 04588c03bdfe..9faf469354a5 100644 > > > --- a/fs/nfsd/filecache.c > > > +++ b/fs/nfsd/filecache.c > > > @@ -318,10 +318,10 @@ nfsd_file_check_writeback(struct nfsd_file *nf) > > > mapping_tagged(mapping, PAGECACHE_TAG_WRITEBACK); > > > } > > > > > > - > > > static bool nfsd_file_lru_add(struct nfsd_file *nf) > > > { > > > set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); > > > + set_bit(NFSD_FILE_RECENT, &nf->nf_flags); > > > > Technically, I don't think you need the REFERENCED bit at all. This is > > the only place it's set, and below this is calling list_lru_add_obj(). > > That returns false if the object was already on a per-node LRU. > > > > Instead of that, you could add a list_lru helper that will rotate the > > object to the end of its nodelist if it's already on one. OTOH, that > > might mean more cross NUMA-node accesses to the spinlocks than we get > > by using a flag and doing this at GC time. > > No, please don't. > > Per-object reference bits are required to enable lazy LRU rotation. > The LRU lists are -hot- objects; touching them every time we touch > an object on the LRU is prohibitively expensive because of exclusive > lock/cacheline contention. Hence we defer operations like rotation > to a context where we already have the list locked and cached > exclusively for some other reason (i.e. memory reclaim). > > This is the same reason we use lazy removal from LRUs - it avoids > LRU list manipulations every time a hot cached object is accessed > and/or dropped. > > IOWs, removing the per-object NFSD_FILE_REFERENCED bit will undo one > of the necessary the optimisations that allow hot caches LRU > management to work efficiently with minimal overhead. > Yep, that was the point of my "OTOH" comment. Keeping the REFERENCED flag is better from a "let's minimize cacheline invalidations" standpoint. -- Jeff Layton <jlayton@xxxxxxxxxx>