[resent as plain-text] > On 20 Dec 2018, at 04:03, Hal Rosenstock <hal@xxxxxxxxxxxxxxxxxx> wrote: > > On 12/19/2018 3:08 PM, Håkon Bugge wrote: >> Using ibacm in an environment with limited pkeys, it is unable to >> obtain a Path Record (PR) in some situations. >> >> The use of limited pkey implies communication over the logical >> partition where one peer is a full member and the other is limited. >> >> When comparing pkey values, the above difference will yield a >> mismatch, even if the logical partitions of the two pkeys compared are >> the very same. >> >> Fixed by introducing a helper function which compares the logical >> partitions. >> >> Querying a PR on the peer which uses the full partition, we see >> without this commit: >> >> ib_acme -s 192.168.200.203 -d 192.168.200.200 -v >> Service: /usr/local/var/run/ibacm-unix.sock >> Destination: 192.168.200.200 >> Source: 192.168.200.203 >> Path information >> dgid: fe80::21:2800:1a1:7c97 >> sgid: fe80::21:2800:1ce:ac18 >> dlid: 243 >> slid: 246 >> flow label: 0x0 >> hop limit: 0 >> tclass: 0 >> reversible: 1 >> pkey: 0x2b4 >> sl: 0 >> mtu: 4 >> rate: 7 >> packet lifetime: 16 >> SA verification: failed Cannot assign requested address >> >> The verbose ibacm log contains: >> >> acm_get_ep_address: SLID(246) DLID(243) >> acm_get_ep_address: notice - could not find SLID(246) DLID(243) >> acm_svr_query_path: notice - could not find local end point address >> >> With this commit, the query with verification has success and we see >> in the log: >> >> acm_get_ep_address: SLID(246) DLID(243) >> acm_same_partition: pkey_a: 0x02b4 pkey_b: 0x82b4 >> [] >> acm_query_response: status 0x0 >> >> Another example is a kernel client that requests PRs over >> netlink. Without this commit, the statistics counters show: >> >> ib_acme -P col >> /usr/local/var/run/ibacm-unix.sock >> Error Count : 234 >> Resolve Count : 234 >> [] >> >> With this commit, we see: >> >> ib_acme -P col >> /usr/local/var/run/ibacm-unix.sock >> Error Count : 0 >> Resolve Count : 228 >> No Data : 0 >> Addr Query Count : 0 >> Addr Cache Count : 0 >> Route Query Count : 1 >> Route Cache Count : 226 >> >> Signed-off-by: Håkon Bugge <haakon.bugge@xxxxxxxxxx> >> --- >> ibacm/src/acm.c | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/ibacm/src/acm.c b/ibacm/src/acm.c >> index b695d1bc..14936597 100644 >> --- a/ibacm/src/acm.c >> +++ b/ibacm/src/acm.c >> @@ -820,6 +820,13 @@ acm_is_path_from_port(struct acmc_port *port, struct ibv_path_record *path) >> return 0; >> } >> >> +static bool acm_same_partition(uint16_t pkey_a, uint16_t pkey_b) { >> + >> + acm_log(2, "pkey_a: 0x%04x pkey_b: 0x%04x\n", pkey_a, pkey_b); >> + >> + return ((pkey_a | IB_PKEY_FULL_MEMBER) == (pkey_b | IB_PKEY_FULL_MEMBER)); > > Doesn't this allow limited member to talk to another limited member of > the same partition ? At least one needs to be a full member. The partition enforcement is done by the HCA ports (optionally also switch ports). The old code did "pkey_a == pkey_b", so in this sense, nothing is changed. Further, this is aka, you get in a request from the peer, which is a full member, and you need to search the end-points to see if you can communicate. Through my testing (and the debug prints of the pkeys), I have seen all four permutations of limited/full for a/b. Thxs, Håkon > > -- Hal > >> +} >> + >> static struct acmc_addr * >> acm_get_port_ep_address(struct acmc_port *port, struct acm_ep_addr_data *data) >> { >> @@ -837,7 +844,7 @@ acm_get_port_ep_address(struct acmc_port *port, struct acm_ep_addr_data *data) >> list_for_each(&port->ep_list, ep, entry) { >> if ((data->type == ACM_EP_INFO_PATH) && >> (!data->info.path.pkey || >> - (be16toh(data->info.path.pkey) == ep->endpoint.pkey))) { >> + acm_same_partition(be16toh(data->info.path.pkey), ep->endpoint.pkey))) { >> for (i = 0; i < MAX_EP_ADDR; i++) { >> if (ep->addr_info[i].addr.type) >> return &ep->addr_info[i]; >> @@ -2158,7 +2165,7 @@ static int acm_assign_ep_names(struct acmc_ep *ep) >> >> if (!strcasecmp(dev_name, dev) && >> (ep->port->port.port_num == (uint8_t) port) && >> - (ep->endpoint.pkey == pkey)) { >> + acm_same_partition(ep->endpoint.pkey, pkey)) { >> acm_log(1, "assigning %s\n", name); >> if (acm_ep_insert_addr(ep, name, addr, type)) { >> acm_log(1, "maximum number of names assigned to EP\n"); >> @@ -2179,7 +2186,7 @@ static struct acmc_ep *acm_find_ep(struct acmc_port *port, uint16_t pkey) >> acm_log(2, "pkey 0x%x\n", pkey); >> >> list_for_each(&port->ep_list, ep, entry) { >> - if (ep->endpoint.pkey == pkey) { >> + if (acm_same_partition(ep->endpoint.pkey, pkey)) { >> res = ep; >> break; >> }