Re: [PATCH 3/3] IB/sa: route SA pathrecord query through netlink

Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> · Wed, 20 May 2015 11:34:00 -0600

On Wed, May 20, 2015 at 04:13:59PM +0000, Hefty, Sean wrote:
> > The other issue is that each caller in the kernel specifies a different
> > timeout.  Defining this in 1 central place and allowing user space to
> > control
> > the policy of that timeout is much better than allowing the kernel clients
> > to
> > specify the timeout as they do now.
> 
> Everything has been randomly hard-coded.  IMO, the sa_query module
> should use its own timeout value, which it updates based on how fast
> it actually gets responses.  But that takes too much work, and no
> one is ever going to write the code to do this.

The IB spec actually says how to compute the timeout for the SA, and
if done properly the SM will configure a timeout appropriate for the
network. It looks like everything the kernel does in this area is
basically wrong..

> For the netlink specific problem, I'll propose using a different
> randomly hard-coded value as a timeout.  Then define an 'MRA' type
> of message that user space can send to the kernel in order to ask it
> to wait longer.  The 'set timeout' message could apply to a single
> request or all future requests.  If we only wanted to the 'all
> future requests' option, the data value could be written into a
> file.  In any case, this pushes the policy of the timeout value into
> the hands of the responding daemon.

A fixed will known timeout (5s?) and require that user space send a
'operation in progress' positive liveness indication seems very
reasonable.

The only purpose of the timeout is to detect a locked up daemon so IB
doesn't lock up, so a watchdog like scheme seems appropriate.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html