> On Aug 11, 2021, at 12:20 PM, Timo Rothenpieler <timo@xxxxxxxxxxxxxxxx> wrote: > > resulting dmesg and trace logs of both client and server are attached. > > Test procedure: > > - start tracing on client and server > - mount NFS on client > - immediately run 'xfs_io -fc "copy_range testfile" testfile.copy' (which succeeds) > - wait 10~15 minutes for the backchannel to time out (still running 5.12.19 with the fix for that reverted) > - run xfs_io command again, getting stuck now > - let it sit there stuck for a minute, then cancel it > - run the command again > - while it's still stuck, finished recording the logs and traces The server tries to send CB_OFFLOAD when the offloaded copy completes, but finds the backchannel transport is not connected. The server can't report the problem until the client sends a SEQUENCE operation, but there's really no other traffic going on, so it just waits. The client eventually sends a singleton SEQUENCE to renew its lease. The server replies with the SEQ4_STATUS_BACKCHANNEL_FAULT flag set at that point. Client's recovery is to destroy that session and create a new one. That appears to be successful. But the server doesn't send another CB_OFFLOAD to let the client know the copy is complete, so the client hangs. This seems to be peculiar to COPY_OFFLOAD, but I wonder if the other CB operations suffer from the same "failed to retransmit after the CB path is restored" issue. It might not matter for some of them, but for others like CB_RECALL, that could be important. -- Chuck Lever