On 1/30/19 5:13 AM, Håkon Bugge wrote: > > >> On 30 Jan 2019, at 00:52, Mark Bloch <markb@xxxxxxxxxxxx> wrote: >> On 1/29/19 3:43 PM, Jason Gunthorpe wrote: >>> [] >>> Hey Hakon, I think you just got nominated as the new ACM maintainer :) :) > > > Thanks! Appreciated to being thought of :-) We are actually working some grounds here in orcl to come up with a candidate as well. Just need to make sure the chain of commands here understands that it takes time and requires certain resources for testing/building etc. So, will shortly come back on this point. > > But, an orcl maintainer isn't really expected to review orcl contributions, is she? > >> Congratulation!, > > Thanks! > >> on a semi-related note I've just found two fixes from 2016 I had for ibacm >> Hakon can you please have a look and see if they are still needed? >> >> commit 966120a5b3c0f8302a4a2d8a1d74207bd35c97e6 >> Author: Mark Bloch <markb@xxxxxxxxxxxx> >> Date: Thu Jun 2 14:40:59 2016 +0300 >> >> Fix invalidation of a path record >> >> When a timeout has expired for a path record, we need to initialize >> the DLID, otherwise the DLID will be included in the request >> for a new path record which is invalid because the DLID may have >> changed. > > I looked briefly into this, and IIRC, ACM uses both gid and lid when both are non-zero. But why not skip the volatile LID always and rely on the gid, which doesn't change? Don't we use the LID in other places? > > >> >> Signed-off-by: Mark Bloch <markb@xxxxxxxxxxxx> >> >> diff --git a/prov/acmp/src/acmp.c b/prov/acmp/src/acmp.c >> index 717bb83..ad7be19 100644 >> --- a/prov/acmp/src/acmp.c >> +++ b/prov/acmp/src/acmp.c >> @@ -1734,11 +1734,12 @@ static int acmp_dest_timeout(struct acmp_dest *dest) >> { >> uint64_t timestamp = time_stamp_min(); >> >> - if (timestamp > dest->addr_timeout) { >> + if (timestamp >= dest->addr_timeout) { >> acm_log(2, "%s address timed out\n", dest->name); >> dest->state = ACMP_INIT; >> return 1; >> - } else if (timestamp > dest->route_timeout) { >> + } else if (timestamp >= dest->route_timeout) { >> + dest->path.dlid = 0; >> acm_log(2, "%s route timed out\n", dest->name); >> dest->state = ACMP_ADDR_RESOLVED; >> return 1; >> and: >> commit ce409bff62a1c9d6d0f3c1a7d0f93241558b8d79 >> Author: Mark Bloch <markb@xxxxxxxxxxxx> >> Date: Thu Sep 15 10:56:19 2016 +0300 >> >> Update port info on reregister event >> >> On SM handover/failover an IBV_EVENT_CLIENT_REREGISTER >> event is triggered. ibacm didn't update the port info (which includes >> SA info) which resulted that mads sent by ibacm were sent with stale >> SA parameters. >> >> The fix forces ibacm to update the port info on IBV_EVENT_CLIENT_REREGISTER. > > If the case here is that the port is active, this sounds correct. Need to see if the issue can be reproed by using ibportstate to change the lid, and get opensm to restore it. Would that be the scenario this commit aims at fixing? >From what I remember (and it was a few years back :)), it fixes the issue where a new SM takes over but ibacm still tries to use the old one. Mark > > > > Thxs, Håkon > > >> >> Signed-off-by: Mark Bloch <markb@xxxxxxxxxxxx> >> >> diff --git a/src/acm.c b/src/acm.c >> index b75f089..99c06c4 100644 >> --- a/src/acm.c >> +++ b/src/acm.c >> @@ -2599,6 +2599,7 @@ static void acm_event_handler(struct acmc_device *dev) >> acm_port_down(&dev->port[i]); >> break; >> case IBV_EVENT_CLIENT_REREGISTER: >> + acm_port_change(&dev->port[i]); >> if ((dev->port[i].state == IBV_PORT_ACTIVE) && >> dev->port[i].prov_port_context) { >> dev->port[i].prov->handle_event(dev->port[i].prov_port_context, >> >> I don't have an a setup/env to test this (just don't some git cleaning and found those) >> and it seems you care and testing this :) >> >> Mark >>> >>> Jason >