> A backoff timer can reduce retries. I don't know how you decide > what the initial backoff should be. I was going with what seems to be the > behavior with tcp. Maybe the backoff adjusts based on IB vs RoCE. Ok, I understand it now. So, with a retries of 5 and a initial timeout of 18 (~1s), this would make: connect_timeout = 1 + 2 + 4 + 8 + 16 + 32 = 63s connect_timeout = initial * (2^(retries + 1) - 1) > > > I don't think that most users needs to tune those parameters. But if > > some use cases require a smaller connection timeout, this should be > > available. > > > > I agree that finding a common ground to adjust the defaults would be > > better but this can be challenging and break non-common fabrics or use > > cases. > > IMO, if we can improve that out of the box experience, that would be ideal. > I agree that there will always be situations where the kernel defaults are > not optimal and either require changing them system wide, or possibly > per rdma_cm_id. > > If we believe that switching to a backoff retry timer is a better direction > or should be an option, does that change the approach for this patch? > A retry count still makes sense, but the timeout is more complex. On that > note, I would specify a timeout in something straightforward, like milliseconds. An exponential backoff timer seems to be a good solution to reduce temporary contentions (when several node reconnect simultaneously). But it makes the overall connection timeout more complex. That why you don't want to expose the initial CM timeout to the user. So, if I follow you here. You suggest to expose only a "connection timeout in ms" to the user and determine a retries count with that. For example, if an user defines a timeout of 50s (with an initial timeout of 1s), we should configure 4 retries. But this would make an effective timeout of 31s. I don't like that idea because it hides what is actually done: A user will set a value in ms and he could have several seconds or minutes of difference with what he expect. So, I would prefer the kernel TCP way. They defined "tcp_retries2" to configure the maximum number of retransmissions for an active connection. The initial timeout value is not configurable (TCP_RTO_MIN). And the retransmission timeout is between TCP_RTO_MIN (200ms) and TCP_RTO_MAX (120s). Etienne