On Fri, Jan 15, 2021 at 04:31:56PM -0500, Ryan Stone wrote: > I'm looking at how MADs are timed out in drivers/infiniband/core/mad.c > and I think that the timeouts are just implemented incorrectly. An > ib_mad_send_wr_private's timeout value is always initialized like > this: > > mad_send_wr->timeout = msecs_to_jiffies(send_buf->timeout_ms); > > This converts a timeout value in milliseconds to a relative value in > jiffies (e.g. if timeout_ms is 500 and HZ is 100, then msec_to_jiffies > returns 5). However there are a number of places in mad.c that use > this as though it's an absolute jiffies value. For example, > timeout_sends() compares the value to the current jiffies counter: > > https://code.woboq.org/linux/linux/drivers/infiniband/core/mad.c.html#2853 > > This doesn't make any sense. In principle it should be fixed by > instead initializing timeout as follows: > > mad_send_wr->timeout = jiffies + msecs_to_jiffies(send_buf->timeout_ms); > > but there are also a number of places that check if timeout > 0 to see > if it's active and that needs to be fixed, possibly with a separate > active flag. > > Am I missing something, or is this just broken? I seem to recall there are two flows here that don't overlap re-using the timeout field for two different things.. It adjusts from one domain to the other in places like this: static void wait_for_response(struct ib_mad_send_wr_private *mad_send_wr) { mad_send_wr->timeout += jiffies; It is so convoluted, so who knows if it could be right or not. Jason