> On Feb 20, 2019, at 10:37 AM, Trond Myklebust <trondmy@xxxxxxxxxxxxxxx> wrote: > > On Wed, 2019-02-20 at 09:25 -0600, Jason L Tibbitts III wrote: >> Sadly, after a bit more testing I find that 4.20.10 with that patch >> applied still exhibits the problem. Just found a machine where NFS >> hung >> up just six minutes after the user logged in. As always: nothing is >> logged to the client or the server. I turned on the tracing as >> requested previously and see what I think it just the same sequence >> as >> before. Please let me know if there is any other information I can >> provide. Right now I'm going to build 4.20.10 with my previously >> posted >> revert patch applied and roll that out to the same 50 hosts and see >> how >> it fares. >> >> - J< >> >> kworker/u8:0-12608 [001] .... 47884.661073: rpc_request: task: >> 20469@1 nfsv4 SEQUENCE (async) >> kworker/u8:0-12608 [001] .... 47884.661073: rpc_task_run_action: >> task:20469@1 flags=5281 state=0005 status=0 action=call_reserve >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661074: rpc_task_run_action: >> task:20469@1 flags=5281 state=0005 status=0 action=call_reserveresult >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661074: rpc_task_run_action: >> task:20469@1 flags=5281 state=0005 status=0 action=call_refresh >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661075: rpc_task_run_action: >> task:20469@1 flags=5281 state=0005 status=0 action=call_refreshresult >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661076: rpc_task_run_action: >> task:20469@1 flags=5281 state=0005 status=0 action=call_allocate >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661076: rpc_task_run_action: >> task:20469@1 flags=5281 state=0005 status=0 action=call_encode >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661091: rpc_task_run_action: >> task:20469@1 flags=5281 state=001d status=0 action=call_bind [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661091: rpc_task_run_action: >> task:20469@1 flags=5281 state=001d status=0 action=call_connect >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661092: rpc_task_run_action: >> task:20469@1 flags=5281 state=001d status=0 action=call_transmit >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661106: xprt_transmit: >> peer=[172.21.86.74]:2049 xid=0x0d7ca068 status=0 >> kworker/u8:0-12608 [001] .... 47884.661107: rpc_task_run_action: >> task:20469@1 flags=5a81 state=0015 status=0 >> action=call_transmit_status [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661107: rpc_task_sleep: task: >> 20469@1 flags=5a81 state=0015 status=0 timeout=60000 >> queue=xprt_pending >> kworker/u9:0-12884 [002] .... 47884.661371: xprt_lookup_rqst: >> peer=[172.21.86.74]:2049 xid=0x0d7ca068 status=0 >> kworker/u9:0-12884 [002] .... 47884.661372: xprt_complete_rqst: >> peer=[172.21.86.74]:2049 xid=0x0d7ca068 status=112 >> kworker/u9:0-12884 [002] .... 47884.661373: rpc_task_wakeup: >> task:20469@1 flags=5a81 state=0006 status=0 timeout=60000 >> queue=xprt_pending >> kworker/u9:0-12884 [002] .... 47884.661375: >> xs_stream_read_request: peer=[172.21.86.74]:2049 xid=0x0d7ca068 >> copied=112 reclen=112 offset=112 >> kworker/u9:0-12884 [002] .... 47884.661376: xs_stream_read_data: >> peer=[172.21.86.74]:2049 err=-11 total=116 >> kworker/u8:0-12608 [001] .... 47884.661388: rpc_task_run_action: >> task:20469@1 flags=5a81 state=0005 status=0 action=xprt_timer >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661389: rpc_task_run_action: >> task:20469@1 flags=5a81 state=0005 status=0 action=call_status >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661389: rpc_task_run_action: >> task:20469@1 flags=5a81 state=0005 status=0 action=call_decode >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661408: rpc_task_run_action: >> task:20469@1 flags=5a81 state=0005 status=-10063 action=rpc_exit_task >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661409: nfs4_sequence_done: >> error=-10063 (SEQ_MISORDERED) session=0x4480e725 slot_nr=0 seq_nr=1 >> highest_slotid=0 target_highest_slotid=0 status_flags=0 () >> kworker/u8:0-12608 [001] .... 47884.661410: rpc_stats_latency: >> task:20469@1 xid=0x0d7ca068 nfsv4 SEQUENCE backlog=21 rtt=278 >> execute=339 >> kworker/u8:0-12608 [001] .... 47884.661412: rpc_task_run_action: >> task:20469@1 flags=5281 state=0005 status=0 action=rpc_prepare_task >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661413: nfs4_setup_sequence: >> session=0x4480e725 slot_nr=0 seq_nr=1 highest_used_slotid=0 >> kworker/u8:0-12608 [001] .... 47884.661413: rpc_task_run_action: >> task:20469@1 flags=5281 state=0005 status=0 action=call_start >> [sunrpc] >> kworker/u8:0-12608 [001] .... 47884.661414: rpc_request: task: >> 20469@1 nfsv4 SEQUENCE (async) > > This is not an RPC layer issue. It is a SEQ_MISORDERED error on slot 0. > If the client can't recover then that will hang your NFSv4.1 session > against that server. > > Are you sure this isn't what was happening previously? This is exactly what was happening earlier. See my e-mail to the list on Feb 8. -- Chuck Lever