Re: [PATCH RFC 0/3] Support standard SRIOV configuration for IB VFs

Jason Gunthorpe <jgunthorpe@xxxxxxxxxxxxxxxxxxxx> · Wed, 27 May 2015 16:42:06 -0600

On Thu, May 28, 2015 at 12:58:59AM +0300, Or Gerlitz wrote:
> On Wed, May 27, 2015 at 8:53 PM, Doug Ledford <dledford@xxxxxxxxxx> wrote:
> 
> > On Wed, 2015-05-27 at 11:11 -0600, Jason Gunthorpe wrote:
> > Well, let's just be clear: netlink/iproute2 screwed the pooch on their
> > implementation of this stuff.  Any solution that doesn't include fixing
> > this up in some way is not really a good solution.
> 
> >>  For instance iproute2 would need IB specific code to format
> >> the 'ip link show' (review print_vfinfo in iproute2) and to length
> >> check 'ip link set vf mac'
> 
> >> If we do use 8, then it would be ideal (and my strong preference) to
> >> also fix the IFLA_VF_MAC message to have a working length. I think
> >> that could be done compatibly with a bit of work. At least that way
> >> iproute2 can be kept clean when it learns to do IB, and we could have
> >> the option again of using 20 someday if we need.
> >>
> 
> > Sounds like a reasonable plan.
> > Or, this is your patch set, are you going to pick up these action items?
> 
> Let see
> 
> >> So to be clear, to go with the 8 byte option I suggest:
> 
> >>  - Engage netdev/iproute and confirm they are philosophically OK
> >>    with IFLA_VF_MAC != IFLA_ADDRESS
> 
> the last thing netdev are is having philosophical views, they are very
> practical (and non-perfect and happy and lively community), the one
> thing before the last  netdev are is caring for the rdma subsystem. If
> something has to change @ their direct part, we should come with
> patches.

Yes, you need to talk in patches, I will suggest you start with this
(totally untested, quickly thrown together):

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 8de36824018d..715b59e9925a 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -1172,7 +1172,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, struct net_device *dev,
 				nla_nest_cancel(skb, vfinfo);
 				goto nla_put_failure;
 			}
-			if (nla_put(skb, IFLA_VF_MAC, sizeof(vf_mac), &vf_mac) ||
+			if (nla_put(skb, IFLA_VF_MAC, sizeof(u32) + dev->addr_len, &vf_mac) ||
 			    nla_put(skb, IFLA_VF_VLAN, sizeof(vf_vlan), &vf_vlan) ||
 			    nla_put(skb, IFLA_VF_RATE, sizeof(vf_rate),
 				    &vf_rate) ||
@@ -1292,7 +1292,8 @@ static const struct nla_policy ifla_vfinfo_policy[IFLA_VF_INFO_MAX+1] = {
 };
 
 static const struct nla_policy ifla_vf_policy[IFLA_VF_MAX+1] = {
-	[IFLA_VF_MAC]		= { .len = sizeof(struct ifla_vf_mac) },
+	[IFLA_VF_MAC]		= { .type = NLA_BINARY,
+				    .len = sizeof(struct ifla_vf_mac) },
 	[IFLA_VF_VLAN]		= { .len = sizeof(struct ifla_vf_vlan) },
 	[IFLA_VF_TX_RATE]	= { .len = sizeof(struct ifla_vf_tx_rate) },
 	[IFLA_VF_SPOOFCHK]	= { .len = sizeof(struct ifla_vf_spoofchk) },
@@ -1447,6 +1448,21 @@ static int do_setvfinfo(struct net_device *dev, struct nlattr *attr)
 		switch (nla_type(vf)) {
 		case IFLA_VF_MAC: {
 			struct ifla_vf_mac *ivm;
+			int sz;
+
+			/* Legacy compatability, old iproute2 will pass in 32
+			 * bytes for everything, assume it is a 6 byte MAC
+			 * because that is all old iproute really supported.
+			 */
+			sz = nla_len(attr);
+			if (sz == 36)
+				sz = 10;
+
+			if (sz != (sizeof(u32) + dev->addr_len)) {
+				err = -EINVAL;
+				break;
+			}
+
 			ivm = nla_data(vf);
 			err = -EOPNOTSUPP;
 			if (ops->ndo_set_vf_mac)

And make the argument why the above is OK.
 - Reader side only ever read 6 bytes anyhow, so it doesn't matter
   that ifla_vf_mac is undersized in the RTA_DATA
 - Confirm any other users of this (openstack?) you know are similarly
   OK.
 - Old write side will always send the full 36 byte struct, so
   assume 6 bytes.
 - Justify this as needing to add IB support next which does not use a
   6 byte addr_len
   New code that wants to work with IB will have to use correct lengths.

Then based on that iproute2 can be fixed in the two areas I pointed
out, doing that at the same time will probably earn some rep.

Finally you can make a patch that switches addr_len to something else
- that would trigger the philosophical question of 'what actually is a
IFLA_VF_MAC', if everyone is OK with changing it then golden.

> >>  - Make a iproute patch to use the IFLA_VF_MAC size in print_vfinfo
> >>    instead of hardcoded ETH_ALEN (using len == 32 mean len 6 for compat)
> 
> I was thinking to patch iproute to sense the link type: if eth print
> six bytes, if ipoib print 8 bytes, simple.

This is against the design ideals of netlink: a tool like iproute
should not need link specific knowledge to parse kernel messages.

Now is the best time to make this fix because we can use the sz == 36
so sz == 10 assumption.

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html