On Mon, 2025-01-27 at 08:22 -0500, Chuck Lever wrote: > On 1/27/25 8:07 AM, Jeff Layton wrote: > > On Mon, 2025-01-27 at 11:15 +1100, NeilBrown wrote: > > > On Mon, 27 Jan 2025, Jeff Layton wrote: > > > > On Mon, 2025-01-27 at 08:53 +1100, NeilBrown wrote: > > > > > On Sun, 26 Jan 2025, Jeff Layton wrote: > > > > > > On Sun, 2025-01-26 at 13:39 +1100, NeilBrown wrote: > > > > > > > On Sun, 26 Jan 2025, Jeff Layton wrote: > > > > > > > > nfsd_file_dispose_list_delayed can be called from the filecache > > > > > > > > laundrette, which is shut down after the nfsd threads are shut down and > > > > > > > > the nfsd_serv pointer is cleared. If nn->nfsd_serv is NULL then there > > > > > > > > are no threads to wake. > > > > > > > > > > > > > > > > Ensure that the nn->nfsd_serv pointer is non-NULL before calling > > > > > > > > svc_wake_up in nfsd_file_dispose_list_delayed. This is safe since the > > > > > > > > svc_serv is not freed until after the filecache laundrette is cancelled. > > > > > > > > > > > > > > > > Fixes: ffb402596147 ("nfsd: Don't leave work of closing files to a work queue") > > > > > > > > Reported-by: Salvatore Bonaccorso <carnil@xxxxxxxxxx> > > > > > > > > Closes: https://lore.kernel.org/linux-nfs/7d9f2a8aede4f7ca9935a47e1d405643220d7946.camel@xxxxxxxxxx/ > > > > > > > > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx> > > > > > > > > --- > > > > > > > > This is only lightly tested, but I think it will fix the bug that > > > > > > > > Salvatore reported. > > > > > > > > --- > > > > > > > > fs/nfsd/filecache.c | 11 ++++++++++- > > > > > > > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > > > > > > > > > > > diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c > > > > > > > > index e91c164b5ea21507659904690533a19ca43b1b64..fb2a4469b7a3c077de2dd750f43239b4af6d37b0 100644 > > > > > > > > --- a/fs/nfsd/filecache.c > > > > > > > > +++ b/fs/nfsd/filecache.c > > > > > > > > @@ -445,11 +445,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose) > > > > > > > > struct nfsd_file, nf_gc); > > > > > > > > struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id); > > > > > > > > struct nfsd_fcache_disposal *l = nn->fcache_disposal; > > > > > > > > + struct svc_serv *serv; > > > > > > > > > > > > > > > > spin_lock(&l->lock); > > > > > > > > list_move_tail(&nf->nf_gc, &l->freeme); > > > > > > > > spin_unlock(&l->lock); > > > > > > > > - svc_wake_up(nn->nfsd_serv); > > > > > > > > + > > > > > > > > + /* > > > > > > > > + * The filecache laundrette is shut down after the > > > > > > > > + * nn->nfsd_serv pointer is cleared, but before the > > > > > > > > + * svc_serv is freed. > > > > > > > > + */ > > > > > > > > + serv = nn->nfsd_serv; > > > > > > > > > > > > > > I wonder if this should be READ_ONCE() to tell the compiler that we > > > > > > > could race with clearing nn->nfsd_serv. Would the comment still be > > > > > > > needed? > > > > > > > > > > > > > > > > > > > I think we need a comment at least. The linkage between the laundrette > > > > > > and the nfsd_serv being set to NULL is very subtle. A READ_ONCE() > > > > > > doesn't convey that well, and is unnecessary here. > > > > > > > > > > Why do you say "is unnecessary here" ? > > > > > If the code were > > > > > if (nn->nfsd_serv) > > > > > svc_wake_up(nn->nfsd_serv); > > > > > that would be wrong as nn->nfds_serv could be set to NULL between the > > > > > two. > > > > > And the C compile is allowed to load the value twice because the C memory > > > > > model declares that would have the same effect. > > > > > While I doubt it would actually change how the code is compiled, I think > > > > > we should have READ_ONCE() here (and I've been wrong before about what > > > > > the compiler will actually do). > > > > > > > > > > > > > > > > > > It's unnecessary because the outcome of either case is acceptable. > > > > > > > > When racing with shutdown, either it's NULL and the laundrette won't > > > > call svc_wake_up(), or it's non-NULL and it will. In the non-NULL case, > > > > the call to svc_wake_up() will be a no-op because the threads are shut > > > > down. > > > > > > > > The vastly common case in this code is that this pointer will be non- > > > > NULL, because the server is running (i.e. not racing with shutdown). I > > > > don't see the need in making all of those accesses volatile. > > > > > > One of us is confused. I hope it isn't me. > > > > > > > It's probably me. I think you have a much better understanding of > > compiler design than I do. Still... > > > > > The hypothetical problem I see is that the C compiler could generate > > > code to load the value "nn->nfsd_serv" twice. The first time it is not > > > NULL, the second time it is NULL. > > > The first is used for the test, the second is passed to svc_wake_up(). > > > > > > Unlikely though this is, it is possible and READ_ONCE() is designed > > > precisely to prevent this. > > > To quote from include/asm-generic/rwonce.h it will > > > "Prevent the compiler from merging or refetching reads" > > > > > > A "volatile" access does not add any cost (in this case). What it does > > > is break any aliasing that the compile might have deduced. > > > Even if the compiler thinks it has "nn->nfsd_serv" in a register, it > > > won't think it has the result of READ_ONCE(nn->nfsd_serv) in that register. > > > And if it needs the result of a previous READ_ONCE(nn->nfsd_serv) it > > > won't decide that it can just read nn->nfsd_serv again. It MUST keep > > > the result of READ_ONCE(nn->nfsd_serv) somewhere until it is not needed > > > any more. > > > > I'm mainly just considering the resulting pointer. There are two > > possible outcomes to the fetch of nn->nfsd_serv. Either it's a valid > > pointer that points to the svc_serv, or it's NULL. The resulting code > > can handle either case, so it doesn't seem like adding READ_ONCE() will > > create any material difference here. > > > > Maybe I should ask it this way: What bad outcome could result if we > > don't add READ_ONCE() here? > > Neil just described it. The compiler would generate two load operations, > one for the test and one for the function call argument. The first load > can retrieve a non-NULL address, and the second a NULL address. > > I agree a READ_ONCE() is necessary. > > Now I'm confused: struct svc_serv *serv; [...] /* * The filecache laundrette is shut down after the * nn->nfsd_serv pointer is cleared, but before the * svc_serv is freed. */ serv = nn->nfsd_serv; if (serv) svc_wake_up(serv); This code is explicitly asking to fetch nn->nfsd_serv into the serv variable, and then is testing that copy of the pointer and passing it into svc_wake_up(). How is the compiler allowed to suddenly refetch a NULL pointer into serv after testing that serv is non-NULL? -- Jeff Layton <jlayton@xxxxxxxxxx>