Re: [PATCH RFC 0/3] Support standard SRIOV configuration for IB VFs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/27/2015 12:11 AM, Jason Gunthorpe wrote:
On Tue, May 26, 2015 at 04:32:58PM -0400, Doug Ledford wrote:

  - ifcfg/udev/networkmanager: So what happens when I do
     ip link add link ib0 name ib0.1 type ipoib
    And get two IPoIB interfaces with the same GUID? I doubt any sane
    user would want to apply the same config to those two interfaces.
No, they probably don't want to apply the same rules to both interfaces.
I'm not entirely sure I agree with the argument though.  I fully
expected this to fail without a pkey argument on the ip command
line.
Does that matter to the above tools? Are they using PKey,GUID as their
key?

The net stack doesn't allow users to do the same thing with Ethernet
devices, so I'm not sure we shouldn't be disallowing this as opposed to
creating duplicate devices that are identical in all ways except name.
The netstack doesn't allow it for ethernet because it would create a
2nd identical LLADDR, and LLADDRs must be unique.

Because the QPN is part of the LLADDR IB can create two interfaces on
the same physical port that are completely separated by hardware. Read
Haggi's email, he explains how they plan to use this to create
interfaces that can be delegated to namespaces. It is not a bad idea
really..

So prepare for a world where each namespace has a child IPoIB
interface with a unique QPN, but the same Pkey and GUID as the
host. The breakage from assuming GUID == unique will become a problem.

Unbreaking it is a UAPI change, not impossible, but do we really care
enough about 8 or 20 to push for that?
In truth, at least right now, it's all moot.  Since we can't set the
subnet prefix, the qpn, or the flags, anything above 8 bytes is
immutable regardless of how many bytes we pass in.  So even if we say we
aren't going to change the UAPI and for everything to 20, the real world
result is that 8 works exactly the same and has no functional
difference.
Not quite, in the 20 byte format the 8 bytes of the GUID are in the
last 8/20 bytes, so the app would have to place 12 zeros and then the
GUID to follow the 20 byte format (or 4 zeros, the prefix, then the GUID)

This is why the question of 'what is ILFA_VF_MAC' is so important,
every option presented (MAC,GUID,LLADDR) are incompatible with each
other.

I agree with Doug that to be practical here, libvirt and Co. would really want to use rtnetlink based provisioning of IB VFs, at least in a similar manner done for Eth VFs.

So with this assumption at hand, my vote goes to having user-space to provide the eight bytes of vGUID through the ndo_set_vf_mac call into IPoIB.

I don't see the real value of user space providing the four zero bytes (19-16) and the 8 bytes of the subnet prefix provided by the SM.

My personal thinking is that the important thing to address is consistency between what the virtualization system provisions on the host (ndo_set_vf_mac) to the DHCP server scheme they build.

Do we have a go here?

Also few comments on DHCP:

If we're talking on different vlans/Eth or pkey/IB - it's totally OK for two entities (== IPoIB instances under IB) on the physical subnet to use the same identifier (IB/GUID, Eth/MAC) if they are on two different L2 broadcast domains. The DHCP server is expected to have a different mapping scheme per such virtual L2 subnet.

For SRIOV, we don't expect two VFs on the network to use the same vGUID, so DHCP wise we should be OK. Today the Client-ID works fine for SRIOV schemes which are based on 8byte vGUIDs.

Re two IPoIB child devices using the same GUID and the same pkey, we can enhance the system and take advantage of IB Alias GUIDs which today are only used for SRIOV for Para-Virtual and other environments too, thanks for the heads up on the necessity of doing so.


What does get return? If we accept 8 or 20, then get must return 20.
The get has to return 20 regardless.  It's the only accepted means of
getting all 20 bytes of the LLADDR.
You are conflating IFLA_ADDRESS and IFLA_VF_MAC.

IFLA_VF_MAC could be 8 byte and IFLA_ADDRESS could be 20, I think that
makes no sense, but it wouldn't break existing stuff.


Just to make sure we're on the same page, this thread deals with using rtnetlink's IFLA_VF_MAC(== struct ifla_vf_mac) for provisioning vGUID for IB VFs, through the PF IPoIB interface, not attempting to use IFLA_ADDRESS.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux