On Mon, Jan 15, 2018 at 01:13:09PM -0500, David Miller wrote: > From: Denis Drozdov <denisd@xxxxxxxxxxxx> > Date: Tue, 9 Jan 2018 23:42:45 +0200 > > > IP link was broken due to the changes in IPoIB for the rdma_netdev > > support after commit cd565b4b51e5 > > ("IB/IPoIB: Support acceleration options callbacks"). > > > > This patchset restores IPoIB pkey creation and removal using rtnetlink. > > The first patch introduces changes in the rtnetlink code in order to allow > > IPOIB allocate and free the netdevice. > > > > The second patch establishes appropriate rtnetlink callbacks for IPoIB > > device and restores IPoIB netlink support > > > > Changes since v1: > > - Fixed double free_netdev calls in ops->free_link in netdev_run_todo > > - Removed priv_size from ipoib_link_ops as not required anymore. > > Please fix your control flow so that the existing netlink op can do > the right thing. Okay. Since this bug has been outstanding in RDMA for so long, I've looked pretty carefully at this.. This patch does go too far, but ipoib does appear to legitimately need something special here and no amount of 'control flow fixing' in ipoib will solve it. The fundamental issue is that ipoib no longer has a generic software netdev for the child interface. Instead the child interface is bound directly to one of several *full* hardware drivers. ie the child and the master are basically exactly the same HW dependent code. Since the child is a full hardware netdev, it has its own HW driven needs for priv_size, txqs, rxqs, etc. There is no 'one size fits all' value like for all the other software based newlink users. It depends on the HW device installed in the system (eg mlx4, mlx5, hfi1) The only problem with rtnl newlink is that it wants to use priv_size, txqs, and rxqs values from a singleton static global structure (eg ipoib_link_ops) which cannot know in advance which ipoib HW driver is in use by the specific parent the child is being created for. This is what needs to be fixed. ipoib simply needs to have parent-dependent inputs to the alloc_netdev_mqs in rtnl_create_link. The patch proposes struct net_device *(*alloc_link)(struct net *src_net, const char *dev_name, struct nlattr *tb[]); To let ipoib allocate the netdev, and then obviosly figure out the right parameters internally. This patch also adds some free related stuff, but that seems to be going too far. ipoib can use the normal free paths. The above is all that is necessary. An alternative would be something like: struct rtnl_alloc_params { size_t priv_size; int txqs; int rxqs; }; void (*get_alloc_parameters)(struct net_device *net, struct rtnl_alloc_params *parms); To let ipoib tell rtnl_create_link what parameters to use based on 'net' instead of taking them from the global singleton. Do you see another choice? Can you accept one of these options? Thanks, Jason (BTW, Doug & I are now co-maintaining RDMA, so I say the above with my maintainer hat on. This is a serious userspace visible regression bug on our side, and I really want to see it fixed.) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html