On Wed, 2025-01-22 at 14:48 +1100, NeilBrown wrote: > On Wed, 22 Jan 2025, NeilBrown wrote: > > On Wed, 22 Jan 2025, Jeff Layton wrote: > > > To be clear, I think we need to drop e57420be100ab from your nfsd- > > > testing branch. The race I identified above is quite likely to occur > > > and could lead to leaks. > > > > > > If Li Lingfeng doesn't propose a patch, I'll spin one up tomorrow. I > > > think the RCU approach is safe. > > > > I'm not convinced this is the right approach. > > I cannot see how nfsd_file_put() can race with unhashing. If it cannot > > then we can simply unconditionally call nfsd_file_schedule_laundrette(). > > > > Can describe how the race can happen - if indeed it can. > > I thought I should explore this more and explain what I think actually > happens ... > > Certainly nfsd_file_unhash() might race with nfsd_file_put(). At this > point in nfsd_file_put() we have the only reference but a hash lookup > could gain another reference and the immediately unhash it. > nfsd_file_queue_for_close() can do this. There might be other paths. > > But why does this mean we need to remove it from the lru and free it > immediately? If we leave it on the lru it will be freed in a couple of > seconds. > > The reason might be nfsd_file_close_inode_sync(). This needs to close > files before returning. > But if nfsd_file_close_inode_sync() is called while some other thread > holds a reference to the file and might want to call nfsd_file_put(), > then it isn't going to succeed anyway so any race here doesn't make any > difference. > > So I think the following might be the best fix > > ??? > > Thanks, > NeilBrown > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > index fcd751cb7c76..773788a50e56 100644 > --- a/fs/nfsd/filecache.c > +++ b/fs/nfsd/filecache.c > @@ -322,10 +322,13 @@ nfsd_file_check_writeback(struct nfsd_file *nf) > static bool nfsd_file_lru_add(struct nfsd_file *nf) > { > set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); > + rcu_read_lock(); > if (list_lru_add_obj(&nfsd_file_lru, &nf->nf_lru)) { > trace_nfsd_file_lru_add(nf); > + rcu_read_unlock(); > return true; > } > + rcu_read_unlock(); > return false; > } > I think that I'm now convinced that it's OK to remove the code in the if block below. But if we do that, then I don't think you need the rcu_read_lock() in nfsd_file_lru_add(). It's just handing off the reference to the LRU at that point and once that's done, it doesn't need to look at it again. That makes the rcu_read_lock() unnecessary. Given that, Li Lingfeng's original patch is OK after all. Am I missing something? > @@ -371,19 +374,8 @@ nfsd_file_put(struct nfsd_file *nf) > > /* Try to add it to the LRU. If that fails, decrement. */ > if (nfsd_file_lru_add(nf)) { > - /* If it's still hashed, we're done */ > - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags)) { > - nfsd_file_schedule_laundrette(); > - return; > - } > - > - /* > - * We're racing with unhashing, so try to remove it from > - * the LRU. If removal fails, then someone else already > - * has our reference. > - */ > - if (!nfsd_file_lru_remove(nf)) > - return; > + nfsd_file_schedule_laundrette(); > + return; > } > } > if (refcount_dec_and_test(&nf->nf_ref)) > -- Jeff Layton <jlayton@xxxxxxxxxx>