>-----Original Message----- >From: Jason Gunthorpe [mailto:jgg@xxxxxxxx] >Sent: Friday, June 8, 2018 6:50 PM >To: Ruhl, Michael J <michael.j.ruhl@xxxxxxxxx> >Cc: RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx> >Subject: Re: ipoib hw multicast addresses > >On Fri, Jun 08, 2018 at 08:48:45PM +0000, Ruhl, Michael J wrote: > >> Looking into this, I found that the netdev device has a list of HW >> addresses (the links), and it appears that when a PKEY change occurs, >> that this list is not cleaned up. > >This seems really obscure, I'm not surprised it doesn't work. > >> So I had a couple of questions: >> >> 1) should this list be cleaned up? >> (the check in ipoib_mcast_restart_task() where the mc_addr list is walked >might >> be a good place, when an address fails the ipoib_mcast_addr_is_valid() >check) >> 2) should this list be populated on the pkey change (i.e. during the >HEAVY_FLUSH), >> whitout needing to do an ifdow/ifup sequence? > >An interface should not randomly change it's LLADDR, especially when >up and operational. That just breaks things. This is not completely random, but it is a real mess... :( When the netdev device is registered (netdev_register) from ipoib_add_port(), the first (incorrect) "HW" address is added to the device MC list: link 00:ff:ff:ff:ff:12:60:1b:80:00:00:00:00:00:00:00:00:00:00:01 ^^^^ (invalid PKEY) The PKEY has not been set, so this address always will be wrong. Once the PKEY is set (my system automatically did the ifup on the interface), two new addresses are added: link 00:ff:ff:ff:ff:12:40:1b:80:04:00:00:00:00:00:00:00:00:00:01 link 00:ff:ff:ff:ff:12:60:1b:80:04:00:00:00:00:00:01:ff:65:ac:56 Addresses are crafted for add/removal using the ip_ib_mc_map() and ipv6_ib_mc_map() functions. The PKEY information is derived from the "broadcast[8 - 9]" bytes of the netdev device (set by ipob_add_port(), updated by: update_parent_key() via a FLUSH_HEAVY). But this does not touch the device MC list at all. The remove functionality (ifdown) will only remove address that are crafted by the _mc_map() functions. As far as I can tell this occurs after the ndo_stop() routine for the ipoib netdev device is called (so IPoIB has not knowledge or control of this). If an address was added with the "incorrect" pkey (such as at system start time), it will not be removed. So it appears that the only way you can get the correct addresses is: Only allow netdev_register() to occur if a pkey is valid (not sure how to do this), or Create the ib0 device (netdev_register()) Before the PKEY is set do an ifup do an ifdown Set the PKEY do an ifup. Not sure where to go from here.... M >So, if the interface starts with a given pkey I would think it should >stay with that pkey or enter a link downed state until the pkey >becomes available. > >Which is what is already sort of happening with the >IPOIB_PKEY_ASSIGNED stuff. > >The tricky bit is what to do until the SM initializes the port for the >very first time.. I suppose that is why we have this broken code in >the first place? > >Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html