In a cluster failover scenario, it is desirable for the client to attempt to reconnect quickly, as an alternate NFS server is already waiting to take over for the down server. The client can't see that a server IP address has moved to a new server until the existing connection is gone. For fabrics and devices where it is meaningful, set an upper bound on the amount of time before it is determined that a connection is no longer valid. This allows the RPC client to detect connection loss in a timely matter, then perform a fresh resolution of the server GUID in case it has changed (cluster failover). Signed-off-by: Chuck Lever <chuck.lever@xxxxxxxxxx> --- net/sunrpc/xprtrdma/verbs.c | 28 ++++++++++++++++++++++++---- 1 file changed, 24 insertions(+), 4 deletions(-) diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index b7a5bc1..5cc57fb 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -211,9 +211,10 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event) struct rpcrdma_ep *ep = &xprt->rx_ep; #if IS_ENABLED(CONFIG_SUNRPC_DEBUG) struct sockaddr *sap = (struct sockaddr *)&ep->rep_remote_addr; -#endif + u64 timeout; struct ib_qp_attr *attr = &ia->ri_qp_attr; struct ib_qp_init_attr *iattr = &ia->ri_qp_init_attr; +#endif int connstate = 0; switch (event->event) { @@ -235,14 +236,23 @@ rpcrdma_conn_upcall(struct rdma_cm_id *id, struct rdma_cm_event *event) complete(&ia->ri_done); break; case RDMA_CM_EVENT_ESTABLISHED: - connstate = 1; +#if IS_ENABLED(CONFIG_SUNRPC_DEBUG) + memset(attr, 0, sizeof(*attr)); ib_query_qp(ia->ri_id->qp, attr, - IB_QP_MAX_QP_RD_ATOMIC | IB_QP_MAX_DEST_RD_ATOMIC, + IB_QP_MAX_QP_RD_ATOMIC | + IB_QP_MAX_DEST_RD_ATOMIC | + IB_QP_TIMEOUT, iattr); dprintk("RPC: %s: %d responder resources" " (%d initiator)\n", __func__, attr->max_dest_rd_atomic, attr->max_rd_atomic); + timeout = 4096 * (1ULL << attr->timeout); + do_div(timeout, NSEC_PER_SEC); + dprintk("RPC: %s: retry timeout: %llu seconds\n", + __func__, timeout); +#endif
Can you put the debug in a separate patch, at first glance I was confused how that helped reboot recovery... -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html