On 1/27/25 8:07 AM, Jeff Layton wrote:
On Mon, 2025-01-27 at 11:15 +1100, NeilBrown wrote:
On Mon, 27 Jan 2025, Jeff Layton wrote:
On Mon, 2025-01-27 at 08:53 +1100, NeilBrown wrote:
On Sun, 26 Jan 2025, Jeff Layton wrote:
On Sun, 2025-01-26 at 13:39 +1100, NeilBrown wrote:
On Sun, 26 Jan 2025, Jeff Layton wrote:
nfsd_file_dispose_list_delayed can be called from the filecache
laundrette, which is shut down after the nfsd threads are shut down and
the nfsd_serv pointer is cleared. If nn->nfsd_serv is NULL then there
are no threads to wake.
Ensure that the nn->nfsd_serv pointer is non-NULL before calling
svc_wake_up in nfsd_file_dispose_list_delayed. This is safe since the
svc_serv is not freed until after the filecache laundrette is cancelled.
Fixes: ffb402596147 ("nfsd: Don't leave work of closing files to a work queue")
Reported-by: Salvatore Bonaccorso <carnil@xxxxxxxxxx>
Closes: https://lore.kernel.org/linux-nfs/7d9f2a8aede4f7ca9935a47e1d405643220d7946.camel@xxxxxxxxxx/
Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
This is only lightly tested, but I think it will fix the bug that
Salvatore reported.
---
fs/nfsd/filecache.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/fs/nfsd/filecache.c b/fs/nfsd/filecache.c
index e91c164b5ea21507659904690533a19ca43b1b64..fb2a4469b7a3c077de2dd750f43239b4af6d37b0 100644
--- a/fs/nfsd/filecache.c
+++ b/fs/nfsd/filecache.c
@@ -445,11 +445,20 @@ nfsd_file_dispose_list_delayed(struct list_head *dispose)
struct nfsd_file, nf_gc);
struct nfsd_net *nn = net_generic(nf->nf_net, nfsd_net_id);
struct nfsd_fcache_disposal *l = nn->fcache_disposal;
+ struct svc_serv *serv;
spin_lock(&l->lock);
list_move_tail(&nf->nf_gc, &l->freeme);
spin_unlock(&l->lock);
- svc_wake_up(nn->nfsd_serv);
+
+ /*
+ * The filecache laundrette is shut down after the
+ * nn->nfsd_serv pointer is cleared, but before the
+ * svc_serv is freed.
+ */
+ serv = nn->nfsd_serv;
I wonder if this should be READ_ONCE() to tell the compiler that we
could race with clearing nn->nfsd_serv. Would the comment still be
needed?
I think we need a comment at least. The linkage between the laundrette
and the nfsd_serv being set to NULL is very subtle. A READ_ONCE()
doesn't convey that well, and is unnecessary here.
Why do you say "is unnecessary here" ?
If the code were
if (nn->nfsd_serv)
svc_wake_up(nn->nfsd_serv);
that would be wrong as nn->nfds_serv could be set to NULL between the
two.
And the C compile is allowed to load the value twice because the C memory
model declares that would have the same effect.
While I doubt it would actually change how the code is compiled, I think
we should have READ_ONCE() here (and I've been wrong before about what
the compiler will actually do).
It's unnecessary because the outcome of either case is acceptable.
When racing with shutdown, either it's NULL and the laundrette won't
call svc_wake_up(), or it's non-NULL and it will. In the non-NULL case,
the call to svc_wake_up() will be a no-op because the threads are shut
down.
The vastly common case in this code is that this pointer will be non-
NULL, because the server is running (i.e. not racing with shutdown). I
don't see the need in making all of those accesses volatile.
One of us is confused. I hope it isn't me.
It's probably me. I think you have a much better understanding of
compiler design than I do. Still...
The hypothetical problem I see is that the C compiler could generate
code to load the value "nn->nfsd_serv" twice. The first time it is not
NULL, the second time it is NULL.
The first is used for the test, the second is passed to svc_wake_up().
Unlikely though this is, it is possible and READ_ONCE() is designed
precisely to prevent this.
To quote from include/asm-generic/rwonce.h it will
"Prevent the compiler from merging or refetching reads"
A "volatile" access does not add any cost (in this case). What it does
is break any aliasing that the compile might have deduced.
Even if the compiler thinks it has "nn->nfsd_serv" in a register, it
won't think it has the result of READ_ONCE(nn->nfsd_serv) in that register.
And if it needs the result of a previous READ_ONCE(nn->nfsd_serv) it
won't decide that it can just read nn->nfsd_serv again. It MUST keep
the result of READ_ONCE(nn->nfsd_serv) somewhere until it is not needed
any more.
I'm mainly just considering the resulting pointer. There are two
possible outcomes to the fetch of nn->nfsd_serv. Either it's a valid
pointer that points to the svc_serv, or it's NULL. The resulting code
can handle either case, so it doesn't seem like adding READ_ONCE() will
create any material difference here.
Maybe I should ask it this way: What bad outcome could result if we
don't add READ_ONCE() here?
Neil just described it. The compiler would generate two load operations,
one for the test and one for the function call argument. The first load
can retrieve a non-NULL address, and the second a NULL address.
I agree a READ_ONCE() is necessary.
--
Chuck Lever