On 10.08.2021 19:17, Chuck Lever III wrote:
What I see in this data is that the server is reporting SEQ4_STATUS_CB_PATH_DOWN and the client is attempting to recover (repeatedly) using BIND_CONN_TO_SESSION. But apparently the recovery didn't actually work, because the server continues to report a callback path problem. [1712389.125641] nfs41_handle_sequence_flag_errors: "10.110.10.200" (client ID 6765f8600a675814) flags=0x00000001 [1712389.129264] nfs4_bind_conn_to_session: bind_conn_to_session was successful for server 10.110.10.200! [1712389.171953] nfs41_handle_sequence_flag_errors: "10.110.10.200" (client ID 6765f8600a675814) flags=0x00000001 [1712389.178361] nfs4_bind_conn_to_session: bind_conn_to_session was successful for server 10.110.10.200! [1712389.195606] nfs41_handle_sequence_flag_errors: "10.110.10.200" (client ID 6765f8600a675814) flags=0x00000001 [1712389.203891] nfs4_bind_conn_to_session: bind_conn_to_session was successful for server 10.110.10.200! I guess it's time to switch to tracing on the server side to see if you can nail down why the server's callback requests are failing. On your NFS server, run: # trace-cmd record -e nfsd -e sunrpc -e rpcgss -e rpcrdma at roughly the same point during your test that you captured the previous client-side trace.
I wonder if reverting 6820bf77864d on the server, to have an easier way to reproduce this state, would be worth it.
Cause it seems like the actual underlying issue is the inability of the NFS server (and/or client) to reestablish the backchannel if it gets disconnected for whatever reason?
Right now I already rebooted the client, and everything is working again, so I'll have to wait a potentially long time for this to happen again otherwise.
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature