> On Oct 30, 2022, at 5:45 PM, NeilBrown <neilb@xxxxxxx> wrote: > > On Sat, 29 Oct 2022, Chuck Lever III wrote: >> >>> On Oct 28, 2022, at 2:57 PM, Jeff Layton <jlayton@xxxxxxxxxx> wrote: >>> >>> The list_lru_add and list_lru_del functions use list_empty checks to see >>> whether the object is already on the LRU. That's fine in most cases, but >>> we occasionally repurpose nf_lru after unhashing. It's possible for an >>> LRU removal to remove it from a different list altogether if we lose a >>> race. Can that issue be resolved by simply adding a "struct list_head nf_dispose" field? That might be more straightforward than adding conditional logic. >> I've never seen that happen. lru field re-use is actually used in other >> places in the kernel. Shouldn't we try to find and fix such races? >> >> Wasn't the idea to reduce the complexity of nfsd_file_put ? >> > > I think the nfsd filecache is different from those "other places" > because of nfsd_file_close_inode() and related code. If I understand > correctly, nfsd can remove a file from the filecache while it is still > in use. Not sure about that; I think nfsd_file_close_inode() is invoked only when a file is deleted. I could be remembering incorrectly, but that seems like a really difficult race to hit. > In this case it needs to be unhashed and removed from the lru - > and then added to a dispose list - while it might still be active for > some IO request. > > Prior to Commit 8d0d254b15cc ("nfsd: fix nfsd_file_unhash_and_dispose") > unhash_and_dispose() wouldn't add to the dispose list unless the > refcount was one. I'm not sure how racy this test was, but it would > mean that it is unlikely for an nfsd_file to be added to the dispose list > if it was still in use. > > After that commit it seems more likely that a nfsd_file will be added to > a dispose list while it is in use. After it's linked to a dispose list via nf_lru, list_lru_add won't put it on the LRU -- it becomes a no-op because nf_lru is now !empty. I think we would have seen LRU corruption pretty quickly. Re-reading Jeff's patch description, that might not be the problem he's trying to address here. But, it would be easy to do some reality testing. I think you could add a WARN_ON or tracepoint in nfsd_file_free() or somewhere in the dispose-list path to catch an in-use nfsd_file? > As we add/remove things to a dispose list without a lock, we need to be > careful that no other thread will add the nfsd_file to an lru at the > same time. refcounts will no longer provide that protection. Maybe we > should restore the refcount protection, or maybe we need a bit as Jeff > has added here. I'm not opposed to defensive changes, in general. This one seems to be adding significant complexity without a clear hazard. I'd like to have a better understanding of exactly what misbehavior is being addressed. -- Chuck Lever