On Thu, 2019-06-13 at 08:28 -0700, Bart Van Assche wrote: > On 6/13/19 7:25 AM, Doug Ledford wrote: > > So, to revive this patch, what I'd like to see is some attempt to > > actually quantify a reasonable timeout for the default backlog > > depth, > > then the patch should actually change the default to that > > reasonable > > timeout, and then put in the ability to adjust the timeout with > > some > > sort of doc guidance on how to calculate a reasonable timeout based > > on > > configured backlog depth. > > How about following the approach of the SRP initiator driver? It > derives > the CM timeout from the subnet manager timeout. The assumption > behind > this is that in large networks the subnet manager timeout has to be > set > higher than its default to make communication work. See also > srp_get_subnet_timeout(). Theoretically, the subnet manager needs a longer timeout in a bigger network because it's handling more data as a single point of lookup for the entire subnet. Individual machines, on the other hand, have the same backlog size (by default) regardless of the size of the network, and there is no guarantee that if the admin increased the subnet manager timeout, that they also increased the backlog queue depth size. So, while I like things that auto-tune like you are suggesting, the problem is that the one item does not directly correlate with the other. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Attachment:
signature.asc
Description: This is a digitally signed message part