Re: [PATCH v3 20/20] xprtrdma: Faster server reboot recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, May 02, 2016 at 02:43:03PM -0400, Chuck Lever wrote:
> In a cluster failover scenario, it is desirable for the client to
> attempt to reconnect quickly, as an alternate NFS server is already
> waiting to take over for the down server. The client can't see that
> a server IP address has moved to a new server until the existing
> connection is gone.
> 
> For fabrics and devices where it is meaningful, set a definite upper
> bound on the amount of time before it is determined that a
> connection is no longer valid. This allows the RPC client to detect
> connection loss in a timely matter, then perform a fresh resolution
> of the server GUID in case it has changed (cluster failover).
> 
> Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx>
> Tested-by: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>
> ---
>  net/sunrpc/xprtrdma/verbs.c |   12 +++++++++++-
>  1 file changed, 11 insertions(+), 1 deletion(-)
> 
> diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c
> index b7a5bc1..be66f65 100644
> --- a/net/sunrpc/xprtrdma/verbs.c
> +++ b/net/sunrpc/xprtrdma/verbs.c
> @@ -554,6 +554,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
>  	ep->rep_attr.recv_cq = recvcq;
>  
>  	/* Initialize cma parameters */
> +	memset(&ep->rep_remote_cma, 0, sizeof(ep->rep_remote_cma));
>  
>  	/* RPC/RDMA does not use private data */
>  	ep->rep_remote_cma.private_data = NULL;
> @@ -567,7 +568,16 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia,
>  		ep->rep_remote_cma.responder_resources =
>  						ia->ri_device->attrs.max_qp_rd_atom;
>  
> -	ep->rep_remote_cma.retry_count = 7;
> +	/* Limit transport retries so client can detect server
> +	 * GID changes quickly. RPC layer handles re-establishing
> +	 * transport connection and retransmission.
> +	 */
> +	ep->rep_remote_cma.retry_count = 6;

Out of curiosity,
Do you know how much time take this retry cycle?
I understand why lowering retry count will cause to faster reconnect,
but I wonder will it be really visible.

> +
> +	/* RPC-over-RDMA handles its own flow control. In addition,
> +	 * make all RNR NAKs visible so we know that RPC-over-RDMA
> +	 * flow control is working correctly (no NAKs should be seen).
> +	 */
>  	ep->rep_remote_cma.flow_control = 0;
>  	ep->rep_remote_cma.rnr_retry_count = 0;
>  
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux USB Development]     [Linux Media Development]     [Video for Linux]     [Linux NILFS]     [Linux Audio Users]     [Yosemite Info]     [Linux SCSI]

  Powered by Linux