> On 30 Nov 2020, at 09:24, Christopher Lameter <cl@xxxxxxxxx> wrote: > > On Fri, 27 Nov 2020, Håkon Bugge wrote: > >>> Huh? When does it talk to a subnet manager (or the SA)? >> >> When resolving the route AND the option "route_prot" is set to "sa". If >> set to "acm", what Hong describes above applies. > > My config has "route_prot" set to "sa" > >>> If its get an IP address of an IB node that does not have ibacm then it >>> fails with a timeout ..... ? And leaves hanging kernel threads around by >>> design? >> >> Nop, the kernel falls back and uses the neighbour cache instead. > > But ib_acme hangs? The main issue here is what the user space app does. > And we need ibacm to cache user space address resolutions. I got the impression that you are debugging this with Honggang. If you want me to help, I need, to start with, an strace of ib_acme and ditto of ibacm. >>> So it only populates the cache from its local node information? >> >> No, if you use ibacm for address resolution the only protocol it has is >> "acm", which means the information comes from a peer ibacm. >> >> If you talk about the cache for routes, it comes either from the SA or a >> peer ibacm, depending on the "route_prot" setting. > > I have always run it with that setting. How can I debug this issue and how > can we fix this? k > >> >>>> To resolve IPoIB address to PathRecord, you must: >>>> 1) The IPoIB interface must UP and RUNNING on the client and target >>>> side. >>>> 2) The ibacm service must RUNNING on the client and target. >>> >>> That is working if you want to resolve only the IP addresses of the IB >>> interfaces on the client and target. None else. >> >> That is why it is called IBacm, right? > > Huh? IBACM is an address resolution service for IB. Somehow that only > includes addresses of hosts running IBACM? Yes. As Honggang explained, ibacmn's address resolution protocol is based on IB multicast, as such, the peer must have ibacm running in order to send a unicast response back with the L2 addr. >>> Here is the description of ibacms function from the sources: >>> >>> "Conceptually, the ibacm service implements an ARP like protocol and >>> either uses IB multicast records to construct path record data or queries >>> the SA directly, depending on the selected route protocol. By default, the >>> ibacm services uses and caches SA path record queries." >>> >>> SA queries dont work. So its broken and cannot talk to the SM. >> >> Why do you say that? It works all the time for me which uses "sa" as "route_prot". > > Not here and not in the tests that RH ran to verify the issue. > > "route_prot" set to "sa" is the default config for the Redhat release of > IBACM. > > However, the addr_prot is set to "acm" by default. I set it to "sa" with > no effect. OK. Understood. As stated above, let me know if you want me to debug this. Thxs, Håkon