Re: [RFC] SUNRPC connect timeout case network request delay

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

  Thanks for your reply.

Chuck Lever 写道:
> On 03/08/2010 04:59 AM, Mi Jinlong wrote:
>> Hi chuck,
>>
>>   Thanks for your reply.
>>
>> Chuck Lever 写道:
>>> On 03/04/2010 05:12 AM, Mi Jinlong wrote:
>>>> Hi,
>>>> Step4: [22:42:16] Write data to file
>>>>          [22:42:16] Write data success
>>>> Step5: [22:42:16] Unlock file
>>>>          [22:46:30] Unlock file success.
>>>> Step6: [22:46:30] Close file /mnt/nfs/file
>>>>          [22:46:30] Close fiel /mnt/nfs/file success
>>>>
>>>> The problem is at step5, unlock file takes 4 min, it's a long time
>>>> than expected.
>>>> When traceing the kernel, I find SUNRPC call call_connect timeout many
>>>> times,
>>>> one timeout is 1min.
>>>
>>> The kernel's TCP reconnect logic will retry until it succeeds, without
>>> letting the upper level make progress.  For some reason, it is having
>>> difficulty reconnecting with your server.
>>>
>>>> I think it's a problem of kernel, but i don't know why, can someone
>>>> help me ?
>>>
>>> # sudo rpcdebug -m rpc -s xprt trans
>>
>>   After running this command, I got some important messages that I think.
>>
>>   RPC:       xs_connect delayed xprt  for 3 seconds
>>   ...
>>   RPC:       xs_connect delayed xprt  for 6 seconds
>>   ...
>>   RPC:       xs_connect delayed xprt  for 12 seconds
>>   ...
>>   RPC:       xs_connect delayed xprt  for 24 seconds
>>   ...
>>   ...
>>   RPC:       xs_connect delayed xprt  for 300 seconds
>>
>> This message is printed at xs_connect, and the delay time is double
>> there.
>> IMO, when some data translate over through a socket, the socket should
>> be released.
>> But, it seems the socket isn't released through those messages above.
>> Is it wrong, or there are some other reasons ?
> 
> The code is trying to connect, but the ->connect call isn't working
> somehow.  The code backs off by doubling the timeout each time, so that
> the connect attempts don't overload the server.
> 
> This tells us that the code is attempting to connect, but not why the
> connect attempt is failing.

  When reading the kernel codes, I find a problem at function xs_tcp_close.
    ....
    772 static void xs_tcp_close(struct rpc_xprt *xprt)
    773 {
    774         if (test_and_clear_bit(XPRT_CONNECTION_CLOSE, &xprt->state))
    775                 xs_close(xprt);
    776         else
    777                 xs_tcp_shutdown(xprt);
    778 }
     ...
  If a task call xs_tcp_close to close the xprt's sock, many times it only call 
  xs_tcp_shutdown to using the next layer's close function to close the socket
  connection.
  But after close the socket connection, the socket also exist, so the socket may
  be reused. Is it a problem ? I think after xs_tcp_shutdown, the socket should
  be released.
  
thanks,
Mi Jinlong

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux