RE: ipoib hw multicast addresses

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>-----Original Message-----
>From: Jason Gunthorpe [mailto:jgg@xxxxxxxx]
>Sent: Friday, June 8, 2018 6:50 PM
>To: Ruhl, Michael J <michael.j.ruhl@xxxxxxxxx>
>Cc: RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>
>Subject: Re: ipoib hw multicast addresses
>
>On Fri, Jun 08, 2018 at 08:48:45PM +0000, Ruhl, Michael J wrote:
>
>> Looking into this, I found that the netdev device has a list of HW
>> addresses (the links), and it appears that when a PKEY change occurs,
>> that this list is not cleaned up.
>
>This seems really obscure, I'm not surprised it doesn't work.
>
>> So I had a couple of questions:
>>
>> 1) should this list be cleaned up?
>> (the check in ipoib_mcast_restart_task() where the mc_addr list is walked
>might
>> be a good place, when an address fails the  ipoib_mcast_addr_is_valid()
>check)
>> 2) should this list be populated on the pkey change (i.e. during the
>HEAVY_FLUSH),
>> whitout needing to do an ifdow/ifup sequence?
>
>An interface should not randomly change it's LLADDR, especially when
>up and operational. That just breaks things.

This is not completely random, but it is a real mess... :(

When the netdev device is registered (netdev_register) from
ipoib_add_port(), the first (incorrect) "HW" address is added to the
device MC list:

link  00:ff:ff:ff:ff:12:60:1b:80:00:00:00:00:00:00:00:00:00:00:01
                                                    ^^^^  (invalid PKEY)

The PKEY has not been set, so this address always will be wrong.

Once the PKEY is set (my system automatically did the ifup on the interface),
two new addresses are added:

link  00:ff:ff:ff:ff:12:40:1b:80:04:00:00:00:00:00:00:00:00:00:01
link  00:ff:ff:ff:ff:12:60:1b:80:04:00:00:00:00:00:01:ff:65:ac:56

Addresses are crafted for add/removal using the ip_ib_mc_map() and
 ipv6_ib_mc_map() functions.

The PKEY information is derived from the "broadcast[8 - 9]" bytes of the netdev
device (set by ipob_add_port(), updated by: update_parent_key() via a
FLUSH_HEAVY).  But this does not touch the device MC list at all.

The remove functionality (ifdown) will only remove address that are crafted by
the _mc_map() functions.  As far as I can tell this occurs after the ndo_stop() routine
for the ipoib netdev device is called (so IPoIB has not knowledge or control of this).
If an address was added with the "incorrect" pkey (such as at system start time),
it will not be removed.

So it appears that the only way you can get the correct addresses is:

Only allow netdev_register() to occur if a pkey is valid (not sure how to do this),

or

Create the ib0 device (netdev_register())
Before the PKEY is set
	do an ifup
	do an ifdown
	Set the PKEY
	do an ifup.

Not sure where to go from here....

M

>So, if the interface starts with a given pkey I would think it should
>stay with that pkey or enter a link downed state until the pkey
>becomes available.
>
>Which is what is already sort of happening with the
>IPOIB_PKEY_ASSIGNED stuff.
>
>The tricky bit is what to do until the SM initializes the port for the
>very first time.. I suppose that is why we have this broken code in
>the first place?
>
>Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux