On Wed, 2022-03-16 at 21:55 +0100, Enrico Scholz wrote: > [You don't often get email from enrico.scholz@xxxxxxxxxxxxxxxxx. > Learn why this is important at > http://aka.ms/LearnAboutSenderIdentification.] > > Enrico Scholz <enrico.scholz@xxxxxxxxxxxxxxxxx> writes: > > > I am experiencing random NFS client lockups after 1-2 days. > > ... > > The full log is available at https://pastebin.pl/view/7d0b345b > > Filtering only RPC messages[1] shows > > ----- > Mar 16 04:48:58.345357: RPC: worker connecting xprt > 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049) > Mar 16 04:48:58.345487: RPC: 0000000022aecad1 connect status > 115 connected 0 sock state 2 > Mar 16 04:48:58.345705: RPC: state 1 conn 0 dead 0 zapped 1 > sk_shutdown 0 > Mar 16 05:02:40.037783: RPC: state 8 conn 1 dead 0 zapped 1 > sk_shutdown 1 > Mar 16 05:02:40.037921: RPC: xs_close xprt 0000000022aecad1 > Mar 16 05:02:40.038076: RPC: state 9 conn 0 dead 0 zapped 1 > sk_shutdown 3 > Mar 16 05:02:40.038211: RPC: state 7 conn 0 dead 0 zapped 1 > sk_shutdown 3 > Mar 16 05:02:40.038293: RPC: xs_error_report client > 0000000022aecad1, error=104... > Mar 16 05:02:40.038449: RPC: state 7 conn 0 dead 0 zapped 1 > sk_shutdown 3 > Mar 16 05:02:40.039134: RPC: xs_connect scheduled xprt > 0000000022aecad1 > Mar 16 05:02:40.042991: RPC: xs_bind > 0000:0000:0000:0000:0000:0000:0000:0000:883: ok (0) > Mar 16 05:02:40.043072: RPC: worker connecting xprt > 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049) > Mar 16 05:02:40.043159: RPC: 0000000022aecad1 connect status > 115 connected 0 sock state 2 > Mar 16 05:02:40.043308: RPC: state 1 conn 0 dead 0 zapped 1 > sk_shutdown 0 > Mar 16 05:02:40.045883: RPC: state 8 conn 1 dead 0 zapped 1 > sk_shutdown 1 > Mar 16 05:02:40.046163: RPC: xs_close xprt 0000000022aecad1 > Mar 16 05:02:40.046410: RPC: state 9 conn 0 dead 0 zapped 1 > sk_shutdown 3 > Mar 16 05:02:40.046625: RPC: state 9 conn 0 dead 0 zapped 1 > sk_shutdown 3 > Mar 16 05:02:40.047428: RPC: xs_connect scheduled xprt > 0000000022aecad1 > Mar 16 05:02:40.050387: RPC: xs_bind > 0000:0000:0000:0000:0000:0000:0000:0000:822: ok (0) > Mar 16 05:02:40.050614: RPC: worker connecting xprt > 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049) > Mar 16 05:02:40.050662: RPC: 0000000022aecad1 connect status > 115 connected 0 sock state 2 > Mar 16 05:02:40.050788: RPC: state 1 conn 0 dead 0 zapped 1 > sk_shutdown 0 > Mar 16 05:02:40.051969: RPC: state 8 conn 1 dead 0 zapped 1 > sk_shutdown 1 > Mar 16 05:02:40.052067: RPC: xs_close xprt 0000000022aecad1 > Mar 16 05:02:40.052189: RPC: state 7 conn 0 dead 0 zapped 1 > sk_shutdown 3 > Mar 16 05:02:40.052243: RPC: xs_error_report client > 0000000022aecad1, error=32... > Mar 16 05:02:40.052367: RPC: state 7 conn 0 dead 0 zapped 1 > sk_shutdown 3 > Mar 16 05:02:40.052503: RPC: state 7 conn 0 dead 0 zapped 1 > sk_shutdown 3 > Mar 16 05:02:40.053201: RPC: xs_connect scheduled xprt > 0000000022aecad1 > Mar 16 05:02:40.055886: RPC: xs_bind > 0000:0000:0000:0000:0000:0000:0000:0000:875: ok (0) > __A__ 05:02:40.055947: RPC: worker connecting xprt > 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049) > Mar 16 05:02:40.055995: RPC: 0000000022aecad1 connect status > 115 connected 0 sock state 2 Socket is in TCP state 2 == SYN_SENT So the client requested to connect to the server > Mar 16 05:02:40.056125: RPC: state 1 conn 0 dead 0 zapped 1 > sk_shutdown 0 Socket is in TCP state 1 == TCP_ESTABLISHED The connection to the server was accepted, and the connection is established. > Mar 16 05:07:28.326605: RPC: state 8 conn 1 dead 0 zapped 1 > sk_shutdown 1 Socket is now in TCP state 8 == TCP_CLOSE_WAIT... That means the server sent a FIN to the client to request that the connection be closed. (see https://users.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf for more information on the TCP state diagram) > Mar 16 05:07:28.326679: RPC: xs_connect scheduled xprt > 0000000022aecad1 > __B__ 05:07:28.326978: RPC: worker connecting xprt > 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049) > Mar 16 05:07:28.327050: RPC: 0000000022aecad1 connect status 0 > connected 0 sock state 8 > __C__ 05:07:28.327113: RPC: xs_close xprt 0000000022aecad1 Client closes the socket, which is still in TCP_CLOSE_WAIT > Mar 16 05:07:28.327229: RPC: state 9 conn 0 dead 0 zapped 1 > sk_shutdown 3 ...and then transitions to TCP state 9 == TCP_LAST_ACK > Mar 16 05:07:28.327446: RPC: state 9 conn 0 dead 0 zapped 1 > sk_shutdown 3 > ----- > For more information on TCP states, see https://users.cs.northwestern.edu/~agupta/cs340/project2/TCPIP_State_Transition_Diagram.pdf Basically, what the above means is that your server is initiating the close of the connection, not the client. -- Trond Myklebust Linux NFS client maintainer, Hammerspace trond.myklebust@xxxxxxxxxxxxxxx