Hi, Thanks for your reply. Chuck Lever 写道: > On 03/08/2010 04:59 AM, Mi Jinlong wrote: >> Hi chuck, >> >> Thanks for your reply. >> >> Chuck Lever 写道: >>> On 03/04/2010 05:12 AM, Mi Jinlong wrote: >>>> Hi, >>>> Step4: [22:42:16] Write data to file >>>> [22:42:16] Write data success >>>> Step5: [22:42:16] Unlock file >>>> [22:46:30] Unlock file success. >>>> Step6: [22:46:30] Close file /mnt/nfs/file >>>> [22:46:30] Close fiel /mnt/nfs/file success >>>> >>>> The problem is at step5, unlock file takes 4 min, it's a long time >>>> than expected. >>>> When traceing the kernel, I find SUNRPC call call_connect timeout many >>>> times, >>>> one timeout is 1min. >>> >>> The kernel's TCP reconnect logic will retry until it succeeds, without >>> letting the upper level make progress. For some reason, it is having >>> difficulty reconnecting with your server. >>> >>>> I think it's a problem of kernel, but i don't know why, can someone >>>> help me ? >>> >>> # sudo rpcdebug -m rpc -s xprt trans >> >> After running this command, I got some important messages that I think. >> >> RPC: xs_connect delayed xprt for 3 seconds >> ... >> RPC: xs_connect delayed xprt for 6 seconds >> ... >> RPC: xs_connect delayed xprt for 12 seconds >> ... >> RPC: xs_connect delayed xprt for 24 seconds >> ... >> ... >> RPC: xs_connect delayed xprt for 300 seconds >> >> This message is printed at xs_connect, and the delay time is double >> there. >> IMO, when some data translate over through a socket, the socket should >> be released. >> But, it seems the socket isn't released through those messages above. >> Is it wrong, or there are some other reasons ? > > The code is trying to connect, but the ->connect call isn't working > somehow. The code backs off by doubling the timeout each time, so that > the connect attempts don't overload the server. > > This tells us that the code is attempting to connect, but not why the > connect attempt is failing. When reading the kernel codes, I find a problem at function xs_tcp_close. .... 772 static void xs_tcp_close(struct rpc_xprt *xprt) 773 { 774 if (test_and_clear_bit(XPRT_CONNECTION_CLOSE, &xprt->state)) 775 xs_close(xprt); 776 else 777 xs_tcp_shutdown(xprt); 778 } ... If a task call xs_tcp_close to close the xprt's sock, many times it only call xs_tcp_shutdown to using the next layer's close function to close the socket connection. But after close the socket connection, the socket also exist, so the socket may be reused. Is it a problem ? I think after xs_tcp_shutdown, the socket should be released. thanks, Mi Jinlong -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html