Re: Random NFS client lockups

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Enrico Scholz <enrico.scholz@xxxxxxxxxxxxxxxxx> writes:

> I am experiencing random NFS client lockups after 1-2 days.
> ...
> The full log is available at https://pastebin.pl/view/7d0b345b

Filtering only RPC messages[1] shows

-----
Mar 16 04:48:58.345357: RPC:       worker connecting xprt 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049)
Mar 16 04:48:58.345487: RPC:       0000000022aecad1 connect status 115 connected 0 sock state 2
Mar 16 04:48:58.345705: RPC:       state 1 conn 0 dead 0 zapped 1 sk_shutdown 0
Mar 16 05:02:40.037783: RPC:       state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
Mar 16 05:02:40.037921: RPC:       xs_close xprt 0000000022aecad1
Mar 16 05:02:40.038076: RPC:       state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:02:40.038211: RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:02:40.038293: RPC:       xs_error_report client 0000000022aecad1, error=104...
Mar 16 05:02:40.038449: RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:02:40.039134: RPC:       xs_connect scheduled xprt 0000000022aecad1
Mar 16 05:02:40.042991: RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:883: ok (0)
Mar 16 05:02:40.043072: RPC:       worker connecting xprt 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049)
Mar 16 05:02:40.043159: RPC:       0000000022aecad1 connect status 115 connected 0 sock state 2
Mar 16 05:02:40.043308: RPC:       state 1 conn 0 dead 0 zapped 1 sk_shutdown 0
Mar 16 05:02:40.045883: RPC:       state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
Mar 16 05:02:40.046163: RPC:       xs_close xprt 0000000022aecad1
Mar 16 05:02:40.046410: RPC:       state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:02:40.046625: RPC:       state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:02:40.047428: RPC:       xs_connect scheduled xprt 0000000022aecad1
Mar 16 05:02:40.050387: RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:822: ok (0)
Mar 16 05:02:40.050614: RPC:       worker connecting xprt 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049)
Mar 16 05:02:40.050662: RPC:       0000000022aecad1 connect status 115 connected 0 sock state 2
Mar 16 05:02:40.050788: RPC:       state 1 conn 0 dead 0 zapped 1 sk_shutdown 0
Mar 16 05:02:40.051969: RPC:       state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
Mar 16 05:02:40.052067: RPC:       xs_close xprt 0000000022aecad1
Mar 16 05:02:40.052189: RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:02:40.052243: RPC:       xs_error_report client 0000000022aecad1, error=32...
Mar 16 05:02:40.052367: RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:02:40.052503: RPC:       state 7 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:02:40.053201: RPC:       xs_connect scheduled xprt 0000000022aecad1
Mar 16 05:02:40.055886: RPC:       xs_bind 0000:0000:0000:0000:0000:0000:0000:0000:875: ok (0)
__A__  05:02:40.055947: RPC:       worker connecting xprt 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049)
Mar 16 05:02:40.055995: RPC:       0000000022aecad1 connect status 115 connected 0 sock state 2
Mar 16 05:02:40.056125: RPC:       state 1 conn 0 dead 0 zapped 1 sk_shutdown 0
Mar 16 05:07:28.326605: RPC:       state 8 conn 1 dead 0 zapped 1 sk_shutdown 1
Mar 16 05:07:28.326679: RPC:       xs_connect scheduled xprt 0000000022aecad1
__B__  05:07:28.326978: RPC:       worker connecting xprt 0000000022aecad1 via tcp to XXXXX:2001:1022:: (port 2049)
Mar 16 05:07:28.327050: RPC:       0000000022aecad1 connect status 0 connected 0 sock state 8
__C__  05:07:28.327113: RPC:       xs_close xprt 0000000022aecad1
Mar 16 05:07:28.327229: RPC:       state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
Mar 16 05:07:28.327446: RPC:       state 9 conn 0 dead 0 zapped 1 sk_shutdown 3
-----

For me, it seems that 'xs_close' at position __C__ belongs to the
connection at __A__ but is applied to the newly created connection
__B__.

Because the close did not happened yet, __B__ operated on the old socket
which is in state 8 (CLOSE_WAIT) and connect returned with 0 instead of
-EINPROGRESS hence.  There will not be generated a new connection in
this case.


It seems to be some race where 'xs_connect()' is executed before a
delayed 'xs_error_handle()'.  Both functions are called in reaction of
a 'CLOSE_WAIT'.




Enrico



[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux