On Thu, 2024-10-31 at 09:40 -0400, cel@xxxxxxxxxx wrote: > From: Chuck Lever <chuck.lever@xxxxxxxxxx> > > nfsd4_shutdown_copy() is just this: > > while ((copy = nfsd4_get_copy(clp)) != NULL) > nfsd4_stop_copy(copy); > > nfsd4_get_copy() bumps @copy's reference count, preventing > nfsd4_stop_copy() from releasing @copy. > > A while loop like this usually works by removing the first element > of the list, but neither nfsd4_get_copy() nor nfsd4_stop_copy() > alters the async_copies list. > > Best I can tell, then, is that nfsd4_shutdown_copy() continues to > loop until other threads manage to remove all the items from this > list. The spinning loop blocks shutdown until these items are gone. > > Possibly the reason we haven't seen this issue in the field is > because client_has_state() prevents __destroy_client() from calling > nfsd4_shutdown_copy() if there are any items on this list. In a > subsequent patch I plan to remove that restriction. > > Fixes: e0639dc5805a ("NFSD introduce async copy feature") > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > --- > fs/nfsd/nfs4proc.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/fs/nfsd/nfs4proc.c b/fs/nfsd/nfs4proc.c > index 9f6617fa5412..8229bbfdd735 100644 > --- a/fs/nfsd/nfs4proc.c > +++ b/fs/nfsd/nfs4proc.c > @@ -1302,6 +1302,9 @@ static struct nfsd4_copy *nfsd4_get_copy(struct nfs4_client *clp) > copy = list_first_entry(&clp->async_copies, struct nfsd4_copy, > copies); > refcount_inc(©->refcount); > + copy->cp_clp = NULL; > + if (!list_empty(©->copies)) > + list_del_init(©->copies); > } > spin_unlock(&clp->async_lock); > return copy; My initial thought was: The problem sounds real, but is nfsd4_get_copy() the place for the above logic? But then I noticed that nfsd4_get_copy() is only called from nfsd4_shutdown_copy(). Maybe we should be rename nfsd4_get_copy() to nfsd4_find_and_unhash_copy() ? -- Jeff Layton <jlayton@xxxxxxxxxx>