On Sat, 29 Oct 2022, Chuck Lever III wrote: > > > On Oct 28, 2022, at 2:57 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: > > > > The list_lru_add and list_lru_del functions use list_empty checks to see > > whether the object is already on the LRU. That's fine in most cases, but > > we occasionally repurpose nf_lru after unhashing. It's possible for an > > LRU removal to remove it from a different list altogether if we lose a > > race. > > I've never seen that happen. lru field re-use is actually used in other > places in the kernel. Shouldn't we try to find and fix such races? > > Wasn't the idea to reduce the complexity of nfsd_file_put ? > I think the nfsd filecache is different from those "other places" because of nfsd_file_close_inode() and related code. If I understand correctly, nfsd can remove a file from the filecache while it is still in use. In this case it needs to be unhashed and removed from the lru - and then added to a dispose list - while it might still be active for some IO request. Prior to Commit 8d0d254b15cc ("nfsd: fix nfsd_file_unhash_and_dispose") unhash_and_dispose() wouldn't add to the dispose list unless the refcount was one. I'm not sure how racy this test was, but it would mean that it is unlikely for an nfsd_file to be added to the dispose list if it was still in use. After that commit it seems more likely that a nfsd_file will be added to a dispose list while it is in use. As we add/remove things to a dispose list without a lock, we need to be careful that no other thread will add the nfsd_file to an lru at the same time. refcounts will no longer provide that protection. Maybe we should restore the refcount protection, or maybe we need a bit as Jeff has added here. NeilBrown