RE: [PATCH rdma-next] IB/cma: Define options to set CM timeouts and retries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Define new options in 'rdma_set_option' to override default CM retries ("Max
> CM retries") and timeouts ("Local CM Response Timeout" and "Remote CM
> Response Timeout").
> 
> These options can be useful for RoCE networks (no SM) to decrease the overall
> connection timeout with an unreachable node (by default, it can take several
> minutes).

I've been looking into this problem, plus related timeout issues myself:

1. Allow a client to timeout quicker when trying to connect to an unreachable node.
2. Prevent a client from ignoring a CM response for an extended period.
   This forces the server to hold resources.
   Problem also occurs if the client node crashes after trying to connect.
3. Improve connection setup time when packets are lost.

I was thinking of aligning closer with the behavior of the TCP stack, plus a couple other adjustments.

a. Reduce the hard-coded CM retries from 15 down to 6.
b. Reduce the hard-coded CM response timeout from 20 (4s) to 18 (1s).
c. Switch CM MADs to use exponential backoff timeouts (1s, 2s, 4s, 8s, etc. + random variation)
d. Selectively send MRA responses -- only in response to a REQ
e. Finally, add tunables to the above options for recovery purposes.

Most of the issues are common to RoCE and IB.  Changes a, b, & c are based on my system's TCP defaults, but my goal was to get timeouts down to about 1 minute.  Change d should help address problem 2.

If the expectation is that most users will want to change the timeout/retry, which I think would be the case, then adjusting the defaults may avoid the overhead of setting them on every cm_id.  The ability to set the values as proposed can substitute for change e, but may require users update their librdamcm.

- Sean





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux