On Thu, Feb 09, 2017 at 11:24:17AM +0200, Matan Barak wrote: > On Thu, Feb 9, 2017 at 2:15 AM, Parav Pandit <parav@xxxxxxxxxxxx> wrote: > > > >> From: Jason Gunthorpe [mailto:jgunthorpe@xxxxxxxxxxxxxxxxxxxx] > >> Sent: Wednesday, February 8, 2017 6:02 PM > >> To: Parav Pandit <parav@xxxxxxxxxxxx> > >> Cc: Matan Barak <matanb@xxxxxxxxxxxxxxxxxx>; Roland Dreier > >> <roland@xxxxxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Matan Barak > >> <matanb@xxxxxxxxxxxx> > >> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ? > >> > >> On Thu, Feb 09, 2017 at 12:00:09AM +0000, Parav Pandit wrote: > >> > >> > > That still breaks link local addresses on vlan devices, so it is an > >> > > ugly hack, not a solution. > >> > > >> > In presence of vlan, shouldn't we be passing the ifindex of the vlan > >> netdev? > >> > >> yes, that is exactly my point... > > > > Oh ok. I get it. I am on right path to fix it than. Use the gid cache to figure out, not seaching netdevs.. > > Additionally, > > when there is macvlan based slave device present on this vlan device, I will pass the ifindex of that particular netdev. > > Now since we don't have MAC address coming in ib_wc nor in IB/RoCEv2 Annex spec, code needs to refer to the > > (a) ifaddr of the vlan netdev > > and > > (b) ifaddr of the slave netdevs > > Compare the DGID of the grh with ifaddr and use that netdev's ifindex for the first matching entry. > > > > Sounds reasonable now? > > > > Since we don't get the DMAC address, I think the GID cache shouldn't > carry entries which the hardware can't differentiate upon. Well, more specifically, with this limiatation, the hardware must *NEVER* receive a packet that does not match the primary MAC of the port. Which goes back to my first point: The hardware should not receive something that is not in the GID cache, period. It sounds like this basic sanity is being viloated in some current rocee hardware??? If any scenario makes the GID cache ambiguous then it cannot be allowed. eg apparently macvlan must be denied, which makes this pretty useless for namespaces. >From your comments, I think the hardware function is going to have to be improved to make this sane. I continue to recommend returning the GID cache index in the WC. > It might be ok for some cases in the transmit side (as you choose > the smac based on the netdev attached to the GID entry, but if you > add a vxlan based interface, you won't be able to add the > appropriate headers). We can leave this as is or making it > symmetrical. Again, it is madness to allow the hardware to receive a packet on a UD QP that is not present in the GID table, and it is unworkable to have a WC that doesn't unambiguously refer to a GID Table entry. So yes, things like vxlan should not be in the gid table if the hardware cannot cope with it. > So, when adding a GID, we need to consult the hardware capabilities > regarding the metadata it can provide in the completion. If the > hardware isn't capable of creating/stripping one of the headers of > this netdev, there's no reason to add it. Yes. This is also why long ago I suggested that the hardware driver should provide a function to resolve the WC into a GID cach entry and that function can rely on hardware unique capabilities. IMHO userspace should not be exposed to this and UD QPs should be locked by hardware to a single netdev worth of gid cache entries. Anything weaker invites exploitation when we talk about namespaces. > If the hardware supports creating/stripping the required headers but > it doesn't support reporting them in the completion or all fields > are supported but there are conflicting entries, you could either > consult the ingress route before adding these GIDs or add them both > and consult the No. Hardware must support all features: create/strip/report/per-QP filter before the GID cache can have an entry. No subsets can be permitted. This probably means existing firmware/hardware/drivers cannot support macvlan and maybe others, but that is much better than trying to support it in an unsafe and insecure way. That probbably answers Parav's earlier question about duplicates in the gid table: It is a bug today that can even happen. Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html