On 7/28/22 11:08, Jason Gunthorpe wrote: > On Wed, Jul 27, 2022 at 11:36:52AM -0500, Bob Pearson wrote: >> unsigned int hdr_len; >> struct sk_buff *skb = NULL; >> - struct net_device *ndev; >> - const struct ib_gid_attr *attr; >> + struct net_device *ndev = rxe->ndev; >> const int port_num = 1; >> - >> - attr = rdma_get_gid_attr(&rxe->ib_dev, port_num, av->grh.sgid_index); >> - if (IS_ERR(attr)) >> - return NULL; > > An ib_device can have many netdevs associated with the gid indexes, eg > from VLANs or LAG. The core code creates these things > > I think it is nonsense for rxe to work like this, and perhaps it > doesn't work at all, but until rxe blocks creation of these other gid > indexes I'm not sure it makes sense to delete this code.. > > Jason Somehow I had the vague impression that rxe didn't support vlans but I just looked at the following commit commit fd49ddaf7e266b5892d659eb99d9f77841e5b4c0 Author: Mohammad Heib <goody698@xxxxxxxxx> Date: Tue Aug 11 18:04:15 2020 +0300 RDMA/rxe: prevent rxe creation on top of vlan interface Creating rxe device on top of vlan interface will create a non-functional device that has an empty gids table and can't be used for rdma cm communication. This is caused by the logic in enum_all_gids_of_dev_cb()/is_eth_port_of_netdev(), which only considers networks connected to "upper devices" of the configured network device, resulting in an empty set of gids for a vlan interface, and attempts to connect via this rdma device fail in cm_init_av_for_response because no gids can be resolved. Apparently, this behavior was implemented to fit the HW-RoCE devices that create RoCE device per port, therefore RXE must behave the same like HW-RoCE devices and create rxe device per real device only. In order to communicate via a vlan interface, the user must use the gid index of the vlan address instead of creating rxe over vlan. Link: https://lore.kernel.org/r/20200811150415.3693-1-goody698@xxxxxxxxx Signed-off-by: Mohammad Heib <goody698@xxxxxxxxx> Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> which jibes with what you are saying. The immediate impact of this is that rxe->ndev should probably not be used unless you know you want the physical device. Bob