> On 7 Jan 2019, at 18:19, Håkon Bugge <haakon.bugge@xxxxxxxxxx> wrote: > > > >> On 2 Jan 2019, at 17:46, Håkon Bugge <haakon.bugge@xxxxxxxxxx> wrote: >> >> >> >>> On 21 Dec 2018, at 16:30, Håkon Bugge <haakon.bugge@xxxxxxxxxx> wrote: >>> >>> >>> >>>> On 21 Dec 2018, at 16:07, Hal Rosenstock <hal@xxxxxxxxxxxxxxxxxx> wrote: >>>> >>>> On 12/21/2018 8:24 AM, Håkon Bugge wrote: >>>>> >>>>> >>>>>> On 21 Dec 2018, at 14:06, Hal Rosenstock <hal@xxxxxxxxxxxxxxxxxx> wrote: >>>>>> >>>>>> On 12/21/2018 4:26 AM, Håkon Bugge wrote: >>>>>>> [resent as plain-text (sigh, looks like I never learn ;-)] >>>>>>> >>>>>>>> On 20 Dec 2018, at 13:46, Hal Rosenstock <hal@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>>> >>>>>>>> On 12/20/2018 2:08 AM, Håkon Bugge wrote: >>>>>>>>> [resent as plain-text] >>>>>>>>> >>>>>>>>>> On 20 Dec 2018, at 04:03, Hal Rosenstock <hal@xxxxxxxxxxxxxxxxxx> wrote: >>>>>>>>>> >>>>>>>>>> On 12/19/2018 3:08 PM, Håkon Bugge wrote: >>>>>>>>>>> Using ibacm in an environment with limited pkeys, it is unable to >>>>>>>>>>> obtain a Path Record (PR) in some situations. >>>>>>>>>>> >>>>>>>>>>> The use of limited pkey implies communication over the logical >>>>>>>>>>> partition where one peer is a full member and the other is limited. >>>>>>>>>>> >>>>>>>>>>> When comparing pkey values, the above difference will yield a >>>>>>>>>>> mismatch, even if the logical partitions of the two pkeys compared are >>>>>>>>>>> the very same. >>>>>>>>>>> >>>>>>>>>>> Fixed by introducing a helper function which compares the logical >>>>>>>>>>> partitions. >>>>>>>>>>> >>>>>>>>>>> Querying a PR on the peer which uses the full partition, we see >>>>>>>>>>> without this commit: >>>>>>>>>>> >>>>>>>>>>> ib_acme -s 192.168.200.203 -d 192.168.200.200 -v >>>>>>>>>>> Service: /usr/local/var/run/ibacm-unix.sock >>>>>>>>>>> Destination: 192.168.200.200 >>>>>>>>>>> Source: 192.168.200.203 >>>>>>>>>>> Path information >>>>>>>>>>> dgid: fe80::21:2800:1a1:7c97 >>>>>>>>>>> sgid: fe80::21:2800:1ce:ac18 >>>>>>>>>>> dlid: 243 >>>>>>>>>>> slid: 246 >>>>>>>>>>> flow label: 0x0 >>>>>>>>>>> hop limit: 0 >>>>>>>>>>> tclass: 0 >>>>>>>>>>> reversible: 1 >>>>>>>>>>> pkey: 0x2b4 >>>>>>>>>>> sl: 0 >>>>>>>>>>> mtu: 4 >>>>>>>>>>> rate: 7 >>>>>>>>>>> packet lifetime: 16 >>>>>>>>>>> SA verification: failed Cannot assign requested address >>>>>>>>>>> >>>>>>>>>>> The verbose ibacm log contains: >>>>>>>>>>> >>>>>>>>>>> acm_get_ep_address: SLID(246) DLID(243) >>>>>>>>>>> acm_get_ep_address: notice - could not find SLID(246) DLID(243) >>>>>>>>>>> acm_svr_query_path: notice - could not find local end point address >>>>>>>>>>> >>>>>>>>>>> With this commit, the query with verification has success and we see >>>>>>>>>>> in the log: >>>>>>>>>>> >>>>>>>>>>> acm_get_ep_address: SLID(246) DLID(243) >>>>>>>>>>> acm_same_partition: pkey_a: 0x02b4 pkey_b: 0x82b4 >>>>>>>>>>> [] >>>>>>>>>>> acm_query_response: status 0x0 >>>>>>>>>>> >>>>>>>>>>> Another example is a kernel client that requests PRs over >>>>>>>>>>> netlink. Without this commit, the statistics counters show: >>>>>>>>>>> >>>>>>>>>>> ib_acme -P col >>>>>>>>>>> /usr/local/var/run/ibacm-unix.sock >>>>>>>>>>> Error Count : 234 >>>>>>>>>>> Resolve Count : 234 >>>>>>>>>>> [] >>>>>>>>>>> >>>>>>>>>>> With this commit, we see: >>>>>>>>>>> >>>>>>>>>>> ib_acme -P col >>>>>>>>>>> /usr/local/var/run/ibacm-unix.sock >>>>>>>>>>> Error Count : 0 >>>>>>>>>>> Resolve Count : 228 >>>>>>>>>>> No Data : 0 >>>>>>>>>>> Addr Query Count : 0 >>>>>>>>>>> Addr Cache Count : 0 >>>>>>>>>>> Route Query Count : 1 >>>>>>>>>>> Route Cache Count : 226 >>>>>>>>>>> >>>>>>>>>>> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> >>>>>>>>>>> --- >>>>>>>>>>> ibacm/src/acm.c | 13 ++++++++++--- >>>>>>>>>>> 1 file changed, 10 insertions(+), 3 deletions(-) >>>>>>>>>>> >>>>>>>>>>> diff --git a/ibacm/src/acm.c b/ibacm/src/acm.c >>>>>>>>>>> index b695d1bc..14936597 100644 >>>>>>>>>>> --- a/ibacm/src/acm.c >>>>>>>>>>> +++ b/ibacm/src/acm.c >>>>>>>>>>> @@ -820,6 +820,13 @@ acm_is_path_from_port(struct acmc_port *port, struct ibv_path_record *path) >>>>>>>>>>> return 0; >>>>>>>>>>> } >>>>>>>>>>> >>>>>>>>>>> +static bool acm_same_partition(uint16_t pkey_a, uint16_t pkey_b) { >>>>>>>>>>> + >>>>>>>>>>> + acm_log(2, "pkey_a: 0x%04x pkey_b: 0x%04x\n", pkey_a, pkey_b); >>>>>>>>>>> + >>>>>>>>>>> + return ((pkey_a | IB_PKEY_FULL_MEMBER) == (pkey_b | IB_PKEY_FULL_MEMBER)); >>>>>>>>>> >>>>>>>>>> Doesn't this allow limited member to talk to another limited member of >>>>>>>>>> the same partition ? At least one needs to be a full member. >>>>>>>>> >>>>>>>>> The partition enforcement is done by the HCA ports (optionally also switch ports). >>>>>>>> >>>>>>>> Yes but this mechanism is meant for errors and this is introducing it as >>>>>>>> a normal case. >>>>>>> >>>>>>> No, it is not. As I said in my first response, when it comes to limited vs. limited, the functionality is the same as before this commit. What the helper function does different from the old code, it that it returns true when comparing limited vs. full and full vs. limited. The other two cases are as before. >>>>>> >>>>>> I was referring to the IB spec and not ACM. When standard IBA mechanisms >>>>>> are used, no PathRecord is available for limited to limited members of a >>>>>> partition so this is error scenario and the other mechanisms (counter, >>>>>> trap) are "backstops" to this supposedly rare event caused by not >>>>>> conforming to IBA. >>>>> >>>>> I agree, but you wrote above "... and this is introducing it as a normal case." I assumed you by "this" meant "this commit". >>>>> >>>>>>>> SA will not return path between 2 limited members of same >>>>>>>> partition so that case is not supposed to occur. >>>>>>> >>>>>>> Correct. >>>>>>> >>>>>>>> Also, it will cause bad >>>>>>>> P_Key traps by HCA/switch to SM for every such packet and this is a bad >>>>>>>> side effect. >>>>>>> >>>>>>> Generally speaking, i.e., not pertinent to this commit, I agree. But, the utterly most frequent source of pkey violations are multicasts, and this is more an architectural flaw in the IBTA spec. There should have been a mechanism to suppress MC violation traps - or - routing thereof should be aware of the pkey in the BTH and the end-node's pkey. But that is a complete other discussion. >>>>>> >>>>>> Limited multicast pkey suppression is supported as an option in the >>>>>> architecture but not implemented. >>>>> >>>>> Yes, I forgot about that, my bad. >>>>> >>>>> >>>>>>> And, speaking relevant to this commit again, the pkey compare function is used for other situations than the one you bring up above. The first example in the commit message, the PR is correctly retrieved, but due to the "-v" running ib_acme, a verification of the PR is requested, and it is this request to the SA that fails without this commit. >>>>>> If ACM is allowing limited <-> limited communication to occur and rely >>>>>> on the error mechanisms, while this will work, it can have negative >>>>>> affects on overall subnet performance. >>>>> >>>>> I agree, but I do not see how this comment applies to this commit. Are you suggesting to also fix this drawback of ACM with this commit? >>>> I might have misunderstood exactly what this patch does. Sorry if I >>>> introduced that confusion into this. >>> >>> No problem! >>> >>> The intent of this commit is to make ACM work in an environment with limited and full partitions. >>> >>>> I'm concerned about how open the door is in ACM for limited <-> limited >>>> communication. Maybe that's just up to the path resolution or cache >>>> preload doing the right thing (meaning if both endpoints are limited >>>> only, no PR should be available). >>> >>> I am concerned also, but this commit doesn't alter anything. Without having tested this, I guess that if you set the loopback_prot to "none" and do keep the default route_prot value ("sa"), then ACM will always ask the SA first for a PR, and what you describe above will not happen. >> >> I am not concerned anymore. First, the behaviour of ACM is not changed with this patch. Secondly, with route_prot set to "sa", ACM will query the SA for the PR, and for two limited ports, this PR request will fail. Thirdly, if route_prot set to "acm", ACM will post an MC send, but the MC packet will have the limited pkey value in its BTH, and as such, not be received by the ACM peer. And since it is not received, the active side will not receive any response. And then no communication will be attempted between the two limited pkey ports. > > Anyone? Pinging anyone. Thxs, Håkon > > > Thxs, Håkon > >> >> >> Thxs, Håkon >> >>> >>> >>> Thxs, Håkon >>> >>>> >>>> -- Hal >>>> >>>>> >>>>> Thxs, Håkon >>>>> >>>>> >>>>>> >>>>>> -- Hal >>>>>> >>>>>>> >>>>>>> Thxs, Håkon >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> -- Hal >>>>>>>> >>>>>>>>> The old code did "pkey_a == pkey_b", so in this sense, nothing is changed. Further, this is aka, you get in a request from the peer, which is a full member, and you need to search the end-points to see if you can communicate. Through my testing (and the debug prints of the pkeys), I have seen all four permutations of limited/full for a/b. >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Thxs, Håkon >>>>>>>>> >>>>>>>>>> >>>>>>>>>> -- Hal >>>>>>>>>> >>>>>>>>>>> +} >>>>>>>>>>> + >>>>>>>>>>> static struct acmc_addr * >>>>>>>>>>> acm_get_port_ep_address(struct acmc_port *port, struct acm_ep_addr_data *data) >>>>>>>>>>> { >>>>>>>>>>> @@ -837,7 +844,7 @@ acm_get_port_ep_address(struct acmc_port *port, struct acm_ep_addr_data *data) >>>>>>>>>>> list_for_each(&port->ep_list, ep, entry) { >>>>>>>>>>> if ((data->type == ACM_EP_INFO_PATH) && >>>>>>>>>>> (!data->info.path.pkey || >>>>>>>>>>> - (be16toh(data->info.path.pkey) == ep->endpoint.pkey))) { >>>>>>>>>>> + acm_same_partition(be16toh(data->info.path.pkey), ep->endpoint.pkey))) { >>>>>>>>>>> for (i = 0; i < MAX_EP_ADDR; i++) { >>>>>>>>>>> if (ep->addr_info[i].addr.type) >>>>>>>>>>> return &ep->addr_info[i]; >>>>>>>>>>> @@ -2158,7 +2165,7 @@ static int acm_assign_ep_names(struct acmc_ep *ep) >>>>>>>>>>> >>>>>>>>>>> if (!strcasecmp(dev_name, dev) && >>>>>>>>>>> (ep->port->port.port_num == (uint8_t) port) && >>>>>>>>>>> - (ep->endpoint.pkey == pkey)) { >>>>>>>>>>> + acm_same_partition(ep->endpoint.pkey, pkey)) { >>>>>>>>>>> acm_log(1, "assigning %s\n", name); >>>>>>>>>>> if (acm_ep_insert_addr(ep, name, addr, type)) { >>>>>>>>>>> acm_log(1, "maximum number of names assigned to EP\n"); >>>>>>>>>>> @@ -2179,7 +2186,7 @@ static struct acmc_ep *acm_find_ep(struct acmc_port *port, uint16_t pkey) >>>>>>>>>>> acm_log(2, "pkey 0x%x\n", pkey); >>>>>>>>>>> >>>>>>>>>>> list_for_each(&port->ep_list, ep, entry) { >>>>>>>>>>> - if (ep->endpoint.pkey == pkey) { >>>>>>>>>>> + if (acm_same_partition(ep->endpoint.pkey, pkey)) { >>>>>>>>>>> res = ep; >>>>>>>>>>> break; >>>>>>>>>>> }