On Thu, 2022-10-27 at 09:32 +1100, NeilBrown wrote: > On Wed, 26 Oct 2022, Jeff Layton wrote: > > The filecache refcounting is a bit non-standard for something searchable > > by RCU, in that we maintain a sentinel reference while it's hashed. > > This in turn requires that we have to do things differently in the "put" > > depending on whether its hashed, etc. > > > > Another issue: nfsd_file_close_inode_sync can end up freeing an > > nfsd_file while there are still outstanding references to it. > > > > Rework the code so that the refcount is what drives the lifecycle. When > > the refcount goes to zero, then unhash and rcu free the object. > > I think the lru reference needs to be counted. Otherwise the nfsd_file > will be freed (and removed from the LRU) as soon as not active request > is referencing it (in the NFSv3 case). This makes the LRU ineffective. > > Looking more closely at the patch, it seems to sometimes treat the LRU > reference as counted, but sometimes not. > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > --- > > fs/nfsd/filecache.c | 202 ++++++++++++++++++++------------------------ > > 1 file changed, 92 insertions(+), 110 deletions(-) > > > > This passes some basic smoke testing and I think closes a number of > > races in this code. I also think the result is a bit simpler and easier > > to follow now. > > > > I looked for some ways to break this up into multiple patches, but I > > didn't find any. This changes the underlying rules of how the > > refcounting works, and I didn't see a way to split that up and still > > have it remain bisectable. > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > > index 918d67cec1ad..6c2f4f2c56a6 100644 > > --- a/fs/nfsd/filecache.c > > +++ b/fs/nfsd/filecache.c > > @@ -1,7 +1,6 @@ > > +// SPDX-License-Identifier: GPL-2.0 > > /* > > - * Open file cache. > > - * > > - * (c) 2015 - Jeff Layton <jeff.layton@xxxxxxxxxxxxxxx> > > + * The NFSD open file cache. > > */ > > > > #include <linux/hash.h> > > @@ -303,8 +302,7 @@ nfsd_file_alloc(struct nfsd_file_lookup_key *key, unsigned int may) > > if (key->gc) > > __set_bit(NFSD_FILE_GC, &nf->nf_flags); > > nf->nf_inode = key->inode; > > - /* nf_ref is pre-incremented for hash table */ > > - refcount_set(&nf->nf_ref, 2); > > + refcount_set(&nf->nf_ref, 1); > > nf->nf_may = key->need; > > nf->nf_mark = NULL; > > } > > @@ -376,11 +374,15 @@ nfsd_file_flush(struct nfsd_file *nf) > > nfsd_reset_write_verifier(net_generic(nf->nf_net, nfsd_net_id)); > > } > > > > -static void nfsd_file_lru_add(struct nfsd_file *nf) > > +static bool nfsd_file_lru_add(struct nfsd_file *nf) > > { > > set_bit(NFSD_FILE_REFERENCED, &nf->nf_flags); > > - if (list_lru_add(&nfsd_file_lru, &nf->nf_lru)) > > + if (!test_bit(NFSD_FILE_HASHED, &nf->nf_flags) && > > Yes, I think this is just wrong. We do need to be able to put hashed entries on the LRU. > > > + list_lru_add(&nfsd_file_lru, &nf->nf_lru)) { > > trace_nfsd_file_lru_add(nf); > > + return true; > > + } > > + return false; > > } > > > > static void nfsd_file_lru_remove(struct nfsd_file *nf) > > @@ -410,7 +412,7 @@ nfsd_file_unhash(struct nfsd_file *nf) > > return false; > > } > > > > -static void > > +static bool > > nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) > > { > > trace_nfsd_file_unhash_and_dispose(nf); > > @@ -418,46 +420,48 @@ nfsd_file_unhash_and_dispose(struct nfsd_file *nf, struct list_head *dispose) > > /* caller must call nfsd_file_dispose_list() later */ > > nfsd_file_lru_remove(nf); > > list_add(&nf->nf_lru, dispose); > > + return true; > > } > > + return false; > > } > > > > -static void > > -nfsd_file_put_noref(struct nfsd_file *nf) > > +static bool > > +__nfsd_file_put(struct nfsd_file *nf) > > { > > - trace_nfsd_file_put(nf); > > - > > + /* v4 case: don't wait for GC */ > > This comment suggests this code is only for the v4 case .... > > > if (refcount_dec_and_test(&nf->nf_ref)) { > > - WARN_ON(test_bit(NFSD_FILE_HASHED, &nf->nf_flags)); > > + nfsd_file_unhash(nf); > > nfsd_file_lru_remove(nf); > > but v4 doesn't use the lru, so this line is pointless. > And if the LRU does hold a counted reference, nf cannot possibly be on > the LRU at this point. > > In fact, this function can be called for the v3 case, so the comment is > just misleading .... or maybe the code is poorly structured. > Ok. I'll clean up the comments. We have the "normal" refcount put codepath, and the v2/3 one where we add it to the LRU if the reference is the last one. > > nfsd_file_free(nf); > > + return true; > > } > > + return false; > > } > > > > -static void > > -nfsd_file_unhash_and_put(struct nfsd_file *nf) > > -{ > > - if (nfsd_file_unhash(nf)) > > - nfsd_file_put_noref(nf); > > -} > > - > > +/** > > + * nfsd_file_put - put the reference to a nfsd_file > > + * @nf: nfsd_file of which to put the reference > > + * > > + * Put a reference to a nfsd_file. In the v4 case, we just put the > > + * reference immediately. In the v2/3 case, if the reference would be > > + * the last one, the put it on the LRU instead to be cleaned up later. > > + */ > > void > > nfsd_file_put(struct nfsd_file *nf) > > { > > - might_sleep(); > > - > > - if (test_bit(NFSD_FILE_GC, &nf->nf_flags) == 1) > > - nfsd_file_lru_add(nf); > > - else if (refcount_read(&nf->nf_ref) == 2) > > - nfsd_file_unhash_and_put(nf); > > + trace_nfsd_file_put(nf); > > > > - if (test_bit(NFSD_FILE_HASHED, &nf->nf_flags) == 0) { > > - nfsd_file_flush(nf); > > - nfsd_file_put_noref(nf); > > - } else if (nf->nf_file && test_bit(NFSD_FILE_GC, &nf->nf_flags) == 1) { > > - nfsd_file_put_noref(nf); > > - nfsd_file_schedule_laundrette(); > > - } else > > - nfsd_file_put_noref(nf); > > + /* NFSv2/3 case */ > > + if (test_bit(NFSD_FILE_GC, &nf->nf_flags)) { > > + /* > > + * If this is the last reference (nf_ref == 1), then transfer > > + * it to the LRU. If the add to the LRU fails, just put it as > > + * usual. > > + */ > > + if (refcount_dec_not_one(&nf->nf_ref) || nfsd_file_lru_add(nf)) > > + return; > > This is one place that looks like you are refcounting entries on the lru. > If this is the last reference and you can transfer it to use LRU, then > you don't need to drop the reference. > This is the only call to nfsd_file_lru_add. We only put an entry on the LRU if it was a "gc" entry and the reference was the last one. > > + } > > + __nfsd_file_put(nf); > > } > > > > NeilBrown -- Jeff Layton <jlayton@xxxxxxxxxx>