On 20.08.2021 17:12, Chuck Lever III wrote:
OK, I think the issue with this reproducer was resolved completely with 6820bf77864d. I went back and reviewed the traces from when the client got stuck after a long uptime. This looks very different from what we're seeing with 6820bf77864d. It involves CB_PATH_DOWN and BIND_CONN_TO_SESSION, which is a different scenario. Long story short, I don't think we're getting any more value by leaving 6820bf77864d reverted. Can you re-apply that commit on your server, and then when the client hangs again, please capture with: # trace-cmd record -e nfsd -e sunrpc -e rpcrdma I'd like to see why the client's BIND_CONN_TO_SESSION fails to repair the backchannel session. -- Chuck Lever
The main system is still running a regular 5.12.19, but so far the issue has not come up again yet. Given how long it takes, I'll report whenever it happens next. I have sadly not found a way to provoke it getting stuck yet.
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature