From: Vlad Dumitrescu <vdumitrescu@xxxxxxxxxx> With the current MAD retry algorithm, the expected total timeout is roughly (retries + 1) * timeout_ms. This is an approximation because scheduling and completion delays are not strictly accounted for. For CM the number of retries is typically CMA_MAX_CM_RETRIES (15), unless the peer is setting REQ:Max CM Retries [1] to a different value. In theory, the timeout could vary, being based on CMA_CM_RESPONSE_TIMEOUT + Packet Life Time, as well as the peer's MRA messages. In practice, for RoCE, the formula above results in 65536ms. Based on the above, set a constant deadline to a round 70s, for all cases. Note that MRAs will end up calling ib_modify_mad which will extend the deadline accordingly. This allows changes to the MAD layer's internal retry algorithm without affecting the total timeout experienced by CM. [1] IBTA v1.7 - Section 12.7.27 - Max CM Retries Signed-off-by: Vlad Dumitrescu <vdumitrescu@xxxxxxxxxx> Reviewed-by: Sean Hefty <shefty@xxxxxxxxxx> Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxx> --- drivers/infiniband/core/cm.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/infiniband/core/cm.c b/drivers/infiniband/core/cm.c index 142170473e75..36649faf9842 100644 --- a/drivers/infiniband/core/cm.c +++ b/drivers/infiniband/core/cm.c @@ -36,6 +36,7 @@ MODULE_LICENSE("Dual BSD/GPL"); #define CM_DESTROY_ID_WAIT_TIMEOUT 10000 /* msecs */ #define CM_DIRECT_RETRY_CTX ((void *) 1UL) +#define CM_MAD_TOTAL_TIMEOUT 70000 /* msecs */ static const char * const ibcm_rej_reason_strs[] = { [IB_CM_REJ_NO_QP] = "no QP", @@ -279,6 +280,7 @@ static struct ib_mad_send_buf *cm_alloc_msg(struct cm_id_private *cm_id_priv) struct ib_mad_agent *mad_agent; struct ib_mad_send_buf *m; struct ib_ah *ah; + int ret; lockdep_assert_held(&cm_id_priv->lock); @@ -309,6 +311,17 @@ static struct ib_mad_send_buf *cm_alloc_msg(struct cm_id_private *cm_id_priv) } m->ah = ah; + m->retries = cm_id_priv->max_cm_retries; + ret = ib_set_mad_deadline(m, CM_MAD_TOTAL_TIMEOUT); + if (ret) { + m = ERR_PTR(ret); + ib_free_send_mad(m); + rdma_destroy_ah(ah, 0); + goto out; + } + + refcount_inc(&cm_id_priv->refcount); + m->context[0] = cm_id_priv; out: spin_unlock(&cm_id_priv->av.port->cm_dev->mad_agent_lock); -- 2.47.0