Re: [PATCH v2 0/7] nfsd: CB_SEQUENCE error handling fixes and cleanups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/29/25 10:01 AM, Jeff Layton wrote:
On Wed, 2025-01-29 at 09:40 -0500, J. Bruce Fields wrote:
On Wed, Jan 29, 2025 at 09:27:02AM -0500, Jeff Layton wrote:
On Wed, 2025-01-29 at 09:21 -0500, J. Bruce Fields wrote:
On Wed, Jan 29, 2025 at 08:39:53AM -0500, Jeff Layton wrote:
While looking over the CB_SEQUENCE error handling, I discovered that
callbacks don't hold a reference to a session, and the
clp->cl_cb_session could easily change between request and response.
If that happens at an inopportune time, there could be UAFs or weird
slot/sequence handling problems.

Nobody should place too much faith in my understanding of how any of
this works at this point, but....  My vague memory is that a lot of
things are serialized simply by being run only on the cl_callback_wq.
Modifying clp->cl_cb_session is such a thing.

It is, but that doesn't save us here. The workqueue is just there to
submit jobs to the RPC client. Once that happens they are run via
rpciod's workqueue (and in parallel with one another since they're
async RPC calls).

So, it's possible that while we're waiting for a response from one
callback, another is submitted, and that workqueue job changes the
clp->cl_cb_session.

I think it calls rpc_shutdown_client() before changing
clp->cl_cb_session.


It does, but the cl_cb_session doesn't carry a reference. My worry was
that the client could call a DESTROY_SESSION at any time.

Now that I look though, you may be right that that's enough to ensure
it because nfsd4_destroy_session() calls nfsd4_probe_callback_sync()
before putting the session reference.

Still, that is a lot of reliance on these things happening in a
particular order.

(Though I'm not sure whether rpc_shutdown_client guarantees that all rpc
processing for the client is completed before returning?)


FWIW, it does wait for them to be killed:

         while (!list_empty(&clnt->cl_tasks)) {
                 rpc_killall_tasks(clnt);
                 wait_event_timeout(destroy_wait,
                         list_empty(&clnt->cl_tasks), 1*HZ);
         }

I'm not crazy about the fact that it does that synchronously in the
workqueue job, but I guess not much else can be happening with
callbacks until this completes.

Bruce, note rpc_shutdown_client() can block indefinitely due to
bugs in NFSD's callback completion handlers. That's one of the
main reasons this is getting attention right not.


This series changes the nfsd4_session to be RCU-freed, and then adds a
new method of session refcounting that is compatible with the old.
nfsd4_callback RPCs will now hold a lightweight reference to the session
in addition to the slot. Then, all of the callback handling is switched
to use that session instead of dereferencing clp->cb_cb_session.
I've also reworked the error handling in nfsd4_cb_sequence_done()
based on review comments, and lifted the v4.0 handing out of that
function.

This passes pynfs, nfstests, and fstests for me, but I'm not sure how
much any of that stresses the backchannel's error handling.

These should probably go in via Chuck's tree, but the last patch touches
some NFS cnd sunrpc client code, so it'd be good to have R-b's or A-b's
from Trond and/or Anna on that one.

Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
---
Changes in v2:
- make nfsd4_session be RCU-freed
- change code to keep reference to session over callback RPCs
- rework error handling in nfsd4_cb_sequence_done()
- move NFSv4.0 handling out of nfsd4_cb_sequence_done()
- Link to v1: https://lore.kernel.org/r/20250123-nfsd-6-14-v1-0-c1137a4fa2ae@xxxxxxxxxx

---
Jeff Layton (7):
       nfsd: add routines to get/put session references for callbacks
       nfsd: make clp->cl_cb_session be an RCU managed pointer
       nfsd: add a cb_ses pointer to nfsd4_callback and use it instead of clp->cb_cb_session
       nfsd: overhaul CB_SEQUENCE error handling
       nfsd: remove unneeded forward declaration of nfsd4_mark_cb_fault()
       nfsd: lift NFSv4.0 handling out of nfsd4_cb_sequence_done()
       sunrpc: make rpc_restart_call() and rpc_restart_call_prepare() void return

  fs/nfs/nfs4proc.c           |  12 ++-
  fs/nfsd/nfs4callback.c      | 212 ++++++++++++++++++++++++++++++++------------
  fs/nfsd/nfs4state.c         |  43 ++++++++-
  fs/nfsd/state.h             |   6 +-
  fs/nfsd/trace.h             |   6 +-
  include/linux/sunrpc/clnt.h |   4 +-
  net/sunrpc/clnt.c           |   7 +-
  7 files changed, 210 insertions(+), 80 deletions(-)
---
base-commit: a05af3c6103b703d1d38d8180b3ebbe0a03c2f07
change-id: 20250123-nfsd-6-14-b0797e385dc0

Best regards,
--
Jeff Layton <jlayton@xxxxxxxxxx>

--
Jeff Layton <jlayton@xxxxxxxxxx>



--
Chuck Lever




[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux