> -----Original Message----- > From: Hefty, Sean > Sent: Friday, October 13, 2017 3:37 PM > To: Ruhl, Michael J <michael.j.ruhl@xxxxxxxxx> > Cc: linux-rdma@xxxxxxxxxxxxxxx; hal@xxxxxxxxxxxxxxxxxx > Subject: RE: [PATCH 1/2] ibacm: Incorrect usage of BE byte order of MLID > attach/detach_mcast() > > > The MLID value passed to ibv_attach/detach_mcast() must be in host > > byte order. > > > > acmp.c incorrectly uses the big endian format when doing a multicast > > attach/detach (join). Multicast packets are used to do name resolution > > by the libibacmp library. > > > > There are two possible results because of this issue. > > > > If a kernel has commit 00b8a3351b2b, the attach will fail with an > > EINVAL. ibacm will log this as a failure during the multicast join. > > > > If a kernel does not have commit 00b8a3351b2b, the attach will > > complete successfully. Packets sent to this address will be dropped > > because the packet dlid value and the multicast address information > > given by the attach will not match. > > > > Update MLID usage to use the correct byte order. > > > > Reviewed-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx> > > Signed-off-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx> > > --- > > ibacm/prov/acmp/src/acmp.c | 4 ++-- > > 1 files changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/ibacm/prov/acmp/src/acmp.c b/ibacm/prov/acmp/src/acmp.c > > index aa78416..78d9a29 100644 > > --- a/ibacm/prov/acmp/src/acmp.c > > +++ b/ibacm/prov/acmp/src/acmp.c > > @@ -732,7 +732,7 @@ static void acmp_process_join_resp(struct > > acm_sa_mad *sa_mad) > > acm_log(0, "ERROR - unable to create ah\n"); > > goto out; > > } > > - ret = ibv_attach_mcast(ep->qp, &mc_rec->mgid, mc_rec- > > >mlid); > > + ret = ibv_attach_mcast(ep->qp, &dest->mgid, dest- > > >av.dlid); > > if (ret) { > > acm_log(0, "ERROR - unable to attach QP to multicast > > group\n"); > > ibv_destroy_ah(dest->ah); > > @@ -1429,7 +1429,7 @@ static void acmp_ep_join(struct acmp_ep *ep) > > > > if (ep->mc_dest[0].state == ACMP_READY && ep->mc_dest[0].ah) { > > ibv_detach_mcast(ep->qp, &ep->mc_dest[0].mgid, > > - be16toh(ep->mc_dest[0].av.dlid)); > > + ep->mc_dest[0].av.dlid); > > ibv_destroy_ah(ep->mc_dest[0].ah); > > ep->mc_dest[0].ah = NULL; > > } > > Changes look correct for both patches. > > Acked-by: Sean Hefty <sean.hefty@xxxxxxxxx> > > It would be nice to understand how the code was working in the past. At > least ibacm has been able to report cached data. All nodes would have > joined the wrong mcast group, but as you mention the dlid in the AV > wouldn't have matched. I tried looking back through the ibacm history, but > didn't see any relevant changes. I don’t believe that the MLID value was checked. I added a patch several months ago to verify that the MLID was a mcast lid (in ib_attach_mcast). And it may be that this is what caused this to stop working. M ��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f