Re: [PATCH v3 3/3] nfsd: back channel stuck in SEQ4_STATUS_CB_PATH_DOWN

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9/16/21 1:15 PM, dai.ngo@xxxxxxxxxx wrote:

On 9/16/21 12:55 PM, Bruce Fields wrote:
On Thu, Sep 16, 2021 at 07:00:20PM +0000, Chuck Lever III wrote:
Bruce, Dai -

On Sep 16, 2021, at 2:22 PM, Dai Ngo <dai.ngo@xxxxxxxxxx> wrote:

When the back channel enters SEQ4_STATUS_CB_PATH_DOWN state, the client recovers by sending BIND_CONN_TO_SESSION but the server fails to recover
the back channel and leaves it as NFSD4_CB_DOWN.

Fix by enhancing nfsd4_bind_conn_to_session to probe the back channel
by calling nfsd4_probe_callback.

Signed-off-by: Dai Ngo <dai.ngo@xxxxxxxxxx>
I'm wondering if this one is appropriate to pull into v5.15-rc.
I think so.

Dai, do you have a pynfs test for this case?

I don't, but I can create a pynfs test for reproduce the problem.

Here are the steps to reproduce the stuck SEQ4_STATUS_CB_PATH_DOWN
problem using 'tcpkill':

Client: 5.13.0-rc2
Server: 5.15.0-rc1

1. [root@nfsvmd07 ~]# mount -o vers=4.1 nfsvme14:/root/xfs /tmp/mnt
2. [root@nfsvmd07 ~]# tcpkill host nfsvme14 and port 2049
3. [root@nfsvmd07 ~]# ls /tmp/mnt
4. CTRL-C to stop tcpkill
5. [root@nfsvmd07 ~]# ls /tmp/mnt

The problem can be observed in the wire trace where the back channel
in stuck in SEQ4_STATUS_CB_PATH_DOWN causing the client to keep sending
BCTS.

Note: this problem can only be reproduced with client running 5.13 or
older.  Client with 5.14 or newer does not have this problem. The
reason is in 5.13, when the client re-establishes the TCP connection
it re-uses the previous port number which was destroyed by tcpkill
(client sends RST to server). This causes the server to set the state
of the back channel to SEQ4_STATUS_CB_PATH_DOWN.  In 5.14, the client
uses a new port number when re-establish the connection this results
in server returning NFS4ERR_CONN_NOT_BOUND_TO_SESSION in the reply of
the stand-alone SEQUENCE which the causes the client to send BCTS once
re-establish the back channel successfully.

I can provide the pcap files of a good and bad run of the test if
interested.

I don't have pynfs test for this case.

-Dai


-Dai


--b.

---
fs/nfsd/nfs4state.c | 16 +++++++++++++---
1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/fs/nfsd/nfs4state.c b/fs/nfsd/nfs4state.c
index 54e5317f00f1..63b4d0e6fc29 100644
--- a/fs/nfsd/nfs4state.c
+++ b/fs/nfsd/nfs4state.c
@@ -3580,7 +3580,7 @@ static struct nfsd4_conn *__nfsd4_find_conn(struct svc_xprt *xpt, struct nfsd4_s
}

static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst,
-                struct nfsd4_session *session, u32 req)
+        struct nfsd4_session *session, u32 req, struct nfsd4_conn **conn)
{
    struct nfs4_client *clp = session->se_client;
    struct svc_xprt *xpt = rqst->rq_xprt;
@@ -3603,6 +3603,8 @@ static __be32 nfsd4_match_existing_connection(struct svc_rqst *rqst,
    else
        status = nfserr_inval;
    spin_unlock(&clp->cl_lock);
+    if (status == nfs_ok && conn)
+        *conn = c;
    return status;
}

@@ -3627,8 +3629,16 @@ __be32 nfsd4_bind_conn_to_session(struct svc_rqst *rqstp,
    status = nfserr_wrong_cred;
    if (!nfsd4_mach_creds_match(session->se_client, rqstp))
        goto out;
-    status = nfsd4_match_existing_connection(rqstp, session, bcts->dir);
-    if (status == nfs_ok || status == nfserr_inval)
+    status = nfsd4_match_existing_connection(rqstp, session,
+            bcts->dir, &conn);
+    if (status == nfs_ok) {
+        if (bcts->dir == NFS4_CDFC4_FORE_OR_BOTH ||
+                bcts->dir == NFS4_CDFC4_BACK)
+            conn->cn_flags |= NFS4_CDFC4_BACK;
+        nfsd4_probe_callback(session->se_client);
+        goto out;
+    }
+    if (status == nfserr_inval)
        goto out;
    status = nfsd4_map_bcts_dir(&bcts->dir);
    if (status)
--
2.9.5

--
Chuck Lever





[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux