Infinite loop in pnfs_update_layout()

"NeilBrown" <neilb@xxxxxxx> · Mon, 12 Feb 2024 12:12:57 +1100

hi,
 I have evidence from a customer of an infinite loop in
 pnfs_update_layout().  This has only happened once and I suspect it is
 unlikely to recur often.  We don't have a lot of tracing data, but I
 think we have enough...
 The evidence I do have is repeated "BUG: workqueue lockup" errors
 with sufficiently many samples that I can determine the code path of
 the loop (see below), and a message:

  NFSv4: state recovery failed for open file SVC_rapid7_dc33/.bash_history, error = -116

 The loop involves the "lookup_again" label and the "goto" on line 2112.
 This is the code where NFS_LAYOUT_INVALID_STID was found to be true and
 nfs4_select_rw_stateid() returned non-zero.

 I deduce that ctx->state is not a valid open stateid.  This leads to
 nfs4_select_rw_stateid() returned -EIO and
 nfs4_schedule_stateid_recovery() doing nothing.  This "doing nothing"
 is the only explanation I can find for the
 nfs4_client_recover_expired_lease() call at the top of the loop not
 waiting at all (if it did wait, we wouldn't get a workqueue lockup).

 The state being invalid also perfectly matches the "state recovery
 failed" error.

 So it seems likely that we should test
    nfs4_valid_open_stateid(ctx->state)
 somewhere in the loop, and return either NULL or and error.  I'm not
 certain what is best.
 My inclination is

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0c0fed1ecd0b..e702ac518205 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2002,6 +2002,12 @@ pnfs_update_layout(struct inode *ino,
 	lseg = ERR_PTR(nfs4_client_recover_expired_lease(clp));
 	if (IS_ERR(lseg))
 		goto out;
+	if (!nfs4_valid_open_stateid(ctx->state)) {
+		lseq = ERR_PTR(-EIO);
+		trace_pnfs_update_layout(ino, pos, count, iomode, lo, lseg,
+					 PNFS_UPDATE_LAYOUT_INVALID_OPEN);
+		goto out;
+	}
 	first = false;
 	spin_lock(&ino->i_lock);
 	lo = pnfs_find_alloc_layout(ino, ctx, gfp_flags);


Does that seem reasonable?
Another possibility would be to check the status from
nfs4_select_rw_stateid() and "goto out_unlock" if it is EIO.

diff --git a/fs/nfs/pnfs.c b/fs/nfs/pnfs.c
index 0c0fed1ecd0b..7cc90ee86882 100644
--- a/fs/nfs/pnfs.c
+++ b/fs/nfs/pnfs.c
@@ -2106,6 +2106,8 @@ pnfs_update_layout(struct inode *ino,
 			trace_pnfs_update_layout(ino, pos, count,
 					iomode, lo, lseg,
 					PNFS_UPDATE_LAYOUT_INVALID_OPEN);
+			if (status == -EIO)
+				goto out_unlock;
 			nfs4_schedule_stateid_recovery(server, ctx->state);
 			pnfs_clear_first_layoutget(lo);
 			pnfs_put_layout_hdr(lo);


Thoughts?

Thanks,
NeilBrown