Re: [PATCH] sunrpc: cancel delayed connect working when conncet success

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Trond,

  what do you think about this problem ?

thanks,
Mi Jinlong

Mi Jinlong 写道:
> Hi Trond,
> 
> Trond Myklebust 写道:
>> On Wed, 2010-08-18 at 17:49 +0800, Mi Jinlong wrote:
>>> As network partition or some other reason, when client connect 
>>> success, maybe there is some delayed connect working in connect_work list.
>>>
>>> Aug  2 12:51:32 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 96 seconds
>>> Aug  2 12:51:32 TEST-M kernel: RPC:       xs_error_report client ccc4c800...
>>> Aug  2 12:51:32 TEST-M kernel: RPC:       error 111
>>>  ... snip ...
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       disconnected transport ccc4c800
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       worker connecting xprt ccc4c800 via tcp to 192.168.0.21 (port 2049)
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       ccc4c800 connect status 115 connected 0 sock state 2
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) = -11
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit incomplete (136 left of 136)
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect xprt ccc4c800 is not connected
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 192 seconds
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_state_change client ccc4c800...
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       state 1 conn 0 dead 0 zapped 1
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_prepare_transmit
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_transmit(136)
>>> Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_send_request(136) = 136
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit complete
>>> Aug  2 12:53:08 TEST-M kernel: RPC:   229 xprt_prepare_transmit
>>>
>>> As the debug message show, "xs_connect delayed xprt ccc4c800 for 192 seconds"
>>> means a connecting work have be delayed at connect_worker list.
>>> "state 1 conn 0 dead 0 zapped 1" shows the connect have successed 
>>> but a delayed work still alive at connect_worker list.
>>>
>>> Signed-off-by: Mi Jinlong <mijinlong@xxxxxxxxxxxxxx>
>>>
>>> ---
>>>  net/sunrpc/xprtsock.c |    4 ++++
>>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
>>> index 49a62f0..823f1db 100644
>>> --- a/net/sunrpc/xprtsock.c
>>> +++ b/net/sunrpc/xprtsock.c
>>> @@ -1324,6 +1324,10 @@ static void xs_tcp_state_change(struct sock *sk)
>>>  			transport->tcp_flags =
>>>  				TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID;
>>>  
>>> +			if (xprt_connecting(xprt) &&
>>> +			      cancel_delayed_work(&transport->connect_worker))
>>> +				xprt_clear_connecting(xprt);
>>> +
>>>  			xprt_wake_pending_tasks(xprt, -EAGAIN);
>>>  		}
>>>  		spin_unlock_bh(&xprt->transport_lock);
>> Wait... According to the above trace, the connect request is _failing_
>> due to an ECONNREFUSED error. In that case, we _want_ to delay the
>> reconnection in order to give the server time to set itself up.
> 
>   Yes, that's right.
> 
>   But, the important part of the trace is 
>   "
>    Aug  2 12:53:08 TEST-M kernel: RPC:   228 xmit incomplete (136 left of 136)
>    Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect xprt ccc4c800 is not connected
>    Aug  2 12:53:08 TEST-M kernel: RPC:       xs_connect delayed xprt ccc4c800 for 192 seconds
>    Aug  2 12:53:08 TEST-M kernel: RPC:       xs_tcp_state_change client ccc4c800...
>    Aug  2 12:53:08 TEST-M kernel: RPC:       state 1 conn 0 dead 0 zapped 1
>    Aug  2 12:53:08 TEST-M kernel: RPC:   228 xprt_connect_status: retrying
>   ".
> 
>   The SUNRPC's TCP connecting is asynchronous, but the tcp_connect()
>   only send a SYN but don't waiting for the ACK reply.
> 
>     CLIENT                                   SERVER
> 
>   1. The first connecting
> 
>     |-xs_connect()
>      |-kernel_connect(O_NONBLOCK)
>        |-tcp_connet()  -------- SYN --------->
> 
>    xs_connect() return with EINPROGRESS and the ACK have not reply. 
> 
>   2. a reconnecting of 1
>     |-xs_connect()
>         queue_delayed_work(rpciod_workqueue,
>                             &transport->connect_worker,
>                             xprt->reestablish_timeout);
> 
>               <---------------ACK SYN--------------
> 
>    After the reconnecting have put the connect working to connect_worker,
>    the ACK-SYN for the first connecting reply, the connect is OK now.
> 
> At this instance, a delayed connect working will be exist at connect_worker
> after connecting success, we should cancel this working.
> 
> thanks,
> Mi Jinlong
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux