On Mon, May 02, 2016 at 02:43:03PM -0400, Chuck Lever wrote: > In a cluster failover scenario, it is desirable for the client to > attempt to reconnect quickly, as an alternate NFS server is already > waiting to take over for the down server. The client can't see that > a server IP address has moved to a new server until the existing > connection is gone. > > For fabrics and devices where it is meaningful, set a definite upper > bound on the amount of time before it is determined that a > connection is no longer valid. This allows the RPC client to detect > connection loss in a timely matter, then perform a fresh resolution > of the server GUID in case it has changed (cluster failover). > > Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> > Tested-by: Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> > --- > net/sunrpc/xprtrdma/verbs.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > index b7a5bc1..be66f65 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -554,6 +554,7 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, > ep->rep_attr.recv_cq = recvcq; > > /* Initialize cma parameters */ > + memset(&ep->rep_remote_cma, 0, sizeof(ep->rep_remote_cma)); > > /* RPC/RDMA does not use private data */ > ep->rep_remote_cma.private_data = NULL; > @@ -567,7 +568,16 @@ rpcrdma_ep_create(struct rpcrdma_ep *ep, struct rpcrdma_ia *ia, > ep->rep_remote_cma.responder_resources = > ia->ri_device->attrs.max_qp_rd_atom; > > - ep->rep_remote_cma.retry_count = 7; > + /* Limit transport retries so client can detect server > + * GID changes quickly. RPC layer handles re-establishing > + * transport connection and retransmission. > + */ > + ep->rep_remote_cma.retry_count = 6; Out of curiosity, Do you know how much time take this retry cycle? I understand why lowering retry count will cause to faster reconnect, but I wonder will it be really visible. > + > + /* RPC-over-RDMA handles its own flow control. In addition, > + * make all RNR NAKs visible so we know that RPC-over-RDMA > + * flow control is working correctly (no NAKs should be seen). > + */ > ep->rep_remote_cma.flow_control = 0; > ep->rep_remote_cma.rnr_retry_count = 0; > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: Digital signature