> > > Walking a linked list just to set a bit in ever entry is a lot more work > > that a few list manipulation in 5 cache-lines. > > Maybe so, but the cache gc isn't a performance critical path. Any yet it was the "gc", which was really "aging", which highlighted the problem for us. Normal list_lru never walks the entire list (unless it is short) except when freeing everything. With the "reference" bit approach, nfsd needs to walk the entire lru frequently. That can make the aging more of a performance issue. dcache uses DCACHE_REFERENCED effectively, but in a different way. It is only cleared by the shrinker for those entries that are one the lru before the ones that can be freed - it doesn't want to clear the DCACHE_REFERENCED bits on everything in the lru (which has it set). nfsd DOES want to clear the referenced bit on everything for the aging. > > > > > This patch removes the list_lru and instead uses 2 simple linked lists. > > > > When a file is accessed it is removed from whichever list it is on, > > > > then added to the tail of the first list. Every 2 seconds the second > > > > list is moved to the "freeme" list and the first list is moved to the > > > > second list. This avoids any need to walk a list to find old entries. > > > > > > Yup, that's exactly what the current code does via the laundrette > > > work that schedules nfsd_file_gc() to run every two seconds does. > > > > > > > These lists are per-netns rather than global as the freeme list is > > > > per-netns as the actual freeing is done in nfsd threads which are > > > > per-netns. > > > > > > The list_lru is actually multiple lists - it is a per-numa node list > > > and so moving to global scope linked lists per netns is going to > > > reduce scalability and increase lock contention on large machines. > > > > Possibly we could duplicate the filecache_disposal structure across > > svc_pools instead of net namespaces. That would fix the scalability > > issue. Probably we should avoid removing and re-adding the file in > > the lru for every access as that probably causes more spinlock > > contention. We would need to adjust the ageing mechanism but i suspect > > it could be made to work. > > Typical usage of list-lru is lazy removal. i.e. we only add it to > the LRU list if it's not already there, and only reclaim/gc removes > it from the list. This is how the inode and dentry caches have > worked since before list_lru even existed, and it's a pattern that > is replicated across many subsystems that use LRUs for gc > purposes... > > IOWs, the "object in cache" lookup fast path should never touch > the LRU at all. Yes, I agree. Whichever way we go, this is a change we need to make to the nfss filecache - not to remove on each access. > > > > i.e. It's kinda hard to make any real comment on "I do not believe > > > list_lru is a good tool for this" when there is no actual > > > measurements provided to back the statement one way or the other... > > > > It's not about measurements, its about behavioural correctness. Yes, I > > should have spelled that out better. Thanks for encouraging me to do > > so. > > So you are saying that you don't care that the existing code can > easily be fixed, that your one-off solution won't scale as well and > is less functional from memory reclaim POV, and that the new > implementation is less maintainable than using generic > infrastructure to do the same work. No, I'm saying that it isn't clear to me that the existing code can be easily fixed. Certainly there are easy improvements. But list_lru is designed for an important access pattern which is different from what the nfsd filecache needs. > > If that's the approach you are taking, then I don't know why you > asked me to point out all the stuff about list_lru that you didn't > seem to know about in the first place... It is always useful to have intelligent review, even when one doesn't agree with all of it. Thanks! NeilBrown