Re: [PATCH rdma-next] IB/cma: Define options to set CM timeouts and retries

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> A backoff timer can reduce retries.  I don't know how you decide
> what the initial backoff should be.  I was going with what seems to be the
> behavior with tcp.  Maybe the backoff adjusts based on IB vs RoCE.

Ok, I understand it now. So, with a retries of 5 and a initial timeout
of 18 (~1s), this would make:

connect_timeout = 1 + 2 + 4 + 8 + 16 + 32 = 63s
connect_timeout = initial * (2^(retries + 1) - 1)

> 
> > I don't think that most users needs to tune those parameters. But if
> > some use cases require a smaller connection timeout, this should be
> > available.
> > 
> > I agree that finding a common ground to adjust the defaults would be
> > better but this can be challenging and break non-common fabrics or use
> > cases.
> 
> IMO, if we can improve that out of the box experience, that would be ideal.
> I agree that there will always be situations where the kernel defaults are
> not optimal and either require changing them system wide, or possibly 
> per rdma_cm_id.
> 
> If we believe that switching to a backoff retry timer is a better direction
> or should be an option, does that change the approach for this patch?
> A retry count still makes sense, but the timeout is more complex.  On that
> note, I would specify a timeout in something straightforward, like milliseconds.

An exponential backoff timer seems to be a good solution to reduce
temporary contentions (when several node reconnect simultaneously).
But it makes the overall connection timeout more complex. That why
you don't want to expose the initial CM timeout to the user.

So, if I follow you here. You suggest to expose only a "connection
timeout in ms" to the user and determine a retries count with that.

For example, if an user defines a timeout of 50s (with an initial
timeout of 1s), we should configure 4 retries. But this would make an
effective timeout of 31s.

I don't like that idea because it hides what is actually done: 
A user will set a value in ms and he could have several seconds or
minutes of difference with what he expect.

So, I would prefer the kernel TCP way. They defined "tcp_retries2" to
configure the maximum number of retransmissions for an active connection.
The initial timeout value is not configurable (TCP_RTO_MIN). And the
retransmission timeout is between TCP_RTO_MIN (200ms) and TCP_RTO_MAX
(120s).

Etienne




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux