Re: [PATCH RFC 0/3] Support standard SRIOV configuration for IB VFs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 27, 2015 at 10:14:06AM -0400, Doug Ledford wrote:
> > Because the QPN is part of the LLADDR IB can create two interfaces on
> > the same physical port that are completely separated by hardware. Read
> > Haggi's email, he explains how they plan to use this to create
> > interfaces that can be delegated to namespaces. It is not a bad idea
> > really.. 
> 
> Yes, it is actually.  The whole reason we went to GUID matching long ago
> was because of this exact issue.

I reflected on this some more last night, and yes, I am leaning toward
'bad idea' direction too.

Too much stuff breaks if you create multiple children with the same
pkey/guid:
 - RDMA CM cannot disambiguate CM packets between them
 - DHCP cannot tell them apart
 - Net scripts/network manager won't work
 - IPv6 becomes totally broken

That means the namespace stuff will have to create children using GUID
aliases..

> The *only* way this will ever be a workable item is if we A) reserve a
> number of queue pairs from the driver specifically for IPoIB use and B)
> specify which queue pairs go to which IPoIB devices at IPoIB module
> load

This basic idea is exactly why I think we should stick with the 20
byte LLADDR for ILFA_VF_MAC. It gives a route for the PF to tell the
VF what QPN to use for IPoIB (if we ever see HW support to implement that)

If we use 8 bytes then that route is closed off forever.

> > Not quite, in the 20 byte format the 8 bytes of the GUID are in the
> > last 8/20 bytes, so the app would have to place 12 zeros and then the
> > GUID to follow the 20 byte format (or 4 zeros, the prefix, then the GUID)
> > 
> > This is why the question of 'what is ILFA_VF_MAC' is so important,
> > every option presented (MAC,GUID,LLADDR) are incompatible with each
> > other.
> 
> For Ethernet devices, it's the MAC.  The equivalent of MAC on IB is the
> GUID.  I would leave it at that.

Yes, both arguments can be made:
  - Our netlink end point is targetting an IPoIB interface, and
    the equivelent to an Ethernet MAC in IPoIB language is the LLADDR.
  - Our netlink interface is targetting the hardware under the IPoIB
    interface and that MAC equivilent is the GUID

> IPoIB devices are constructs on top of
> the GUID/link, and you can have 10 IPoIB interfaces between the parent
> and children, but we don't need to specify all of those LLADDRs, we just
> need to give a unique GUID and allow the guest OS to create their own
> IPoIB devices on top of that.

As I've said, I would like to see netdev review that idea before we
merge any patches..

There are pragmatic downsides to the 8 byte choice: Userspace
completely looses the ability to size the address without a table
based on link type. That is terrible in the context of netlink's
design. For instance iproute2 would need IB specific code to format
the 'ip link show' (review print_vfinfo in iproute2) and to length
check 'ip link set vf mac'

If we do use 8, then it would be ideal (and my strong preference) to
also fix the IFLA_VF_MAC message to have a working length. I think
that could be done compatibly with a bit of work. At least that way
iproute2 can be kept clean when it learns to do IB, and we could have
the option again of using 20 someday if we need.

So to be clear, to go with the 8 byte option I suggest:
 - Engage netdev/iproute and confirm they are philosophically OK
   with IFLA_VF_MAC != IFLA_ADDRESS
 - Make a kernel patch to properly size the IFLA_VF_MAC message
 - Make a iproute patch to use the IFLA_VF_MAC size in print_vfinfo
   instead of hardcoded ETH_ALEN (using len == 32 mean len 6 for compat)
 - Drop in the IB patch
 
Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux