RE: Need to set if_index in ib_init_ah_from_wc() ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe@xxxxxxxxxxxxxxxxxxxx]
> Sent: Thursday, February 9, 2017 12:47 PM
> To: Matan Barak <matanb@xxxxxxxxxxxxxxxxxx>
> Cc: Parav Pandit <parav@xxxxxxxxxxxx>; Roland Dreier
> <roland@xxxxxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Matan Barak
> <matanb@xxxxxxxxxxxx>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Thu, Feb 09, 2017 at 11:24:17AM +0200, Matan Barak wrote:
> > On Thu, Feb 9, 2017 at 2:15 AM, Parav Pandit <parav@xxxxxxxxxxxx>
> wrote:
> > >
> > >> From: Jason Gunthorpe [mailto:jgunthorpe@xxxxxxxxxxxxxxxxxxxx]
> > >> Sent: Wednesday, February 8, 2017 6:02 PM
> > >> To: Parav Pandit <parav@xxxxxxxxxxxx>
> > >> Cc: Matan Barak <matanb@xxxxxxxxxxxxxxxxxx>; Roland Dreier
> > >> <roland@xxxxxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx; Matan Barak
> > >> <matanb@xxxxxxxxxxxx>
> > >> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> > >>
> > >> On Thu, Feb 09, 2017 at 12:00:09AM +0000, Parav Pandit wrote:
> > >>
> > >> > > That still breaks link local addresses on vlan devices, so it
> > >> > > is an ugly hack, not a solution.
> > >> >
> > >> > In presence of vlan, shouldn't we be passing the ifindex of the
> > >> > vlan
> > >> netdev?
> > >>
> > >> yes, that is exactly my point...
> > >
> > > Oh ok. I get it. I am on right path to fix it than.
> 
> Use the gid cache to figure out, not seaching netdevs..
> 
I still believe I need to search gid cache based on the netdev.
ib_find_cached_gid_by_port() already supports it.

Help me understand how to use the gid cache for below scenario without netdev.

eth0 has two vlans.
Each vlan device has one macvlan device.
eth0 (mac-A)
|----->eth0.vlan1 (mac-A)
        |---->macvlan1 (mac-B) - IP-1
        |---->macvlan2 (mac-C) - IP-2
|-----> eth0.vlan2 (mac-A)
        |---->macvlan3 (mac-D) - IP-1
        |---->macvlan4 (mac-E) - IP -3
Please note duplicate IP-1 assigned to two macvlan devices of two different vlan.
These two macvlan devices are in two different namespace and therefore duplicate IP is perfectly valid for them.
This means our GID cache will have two GID entries with different netdev.
Macvlan netdev doesn't have vlan property set.

Now when ib_wc arrives for it, it will not have mac address in it, but we will have VLAN.
Which means we can only reach upto eth0.vlan1.
If I seach gid cache purely based on the GID, vlan id, and netdev of eth0.vlan1 there won't a match in GID table.
So in order to have right search in gid cache we need the netdev of the macvlan-1.
So what proposed is,

If (wc_without_vlan()) {
      leaf_netdev = eth0.netdev;
} else {
     /* there is vlan in wc */
    Ifaddr = search_ifaddr(netdev, gid);
    If (ifaddr_found) {
        /* we found matching IP for this netdev, use this netdev further  */
       leaf_netdev = vlan_netdev;
    } else {
          for_each_lower_netdev_of_vlan(slave_ndev, list) {
                 ifaddr = search_ifaddr(netdev, gid);
                 if (ifaddr_found) {
                     /* matching ifaddr found, in macvlan dev, use this netdev further */
                     leaf_netdev = macvlan_netdev;
                     break;
                 }
          }
     gid_index = search_gid_cache(GID, leaf_netdev);
}

This should work with and without namespaces to resolve the right netdev and GID for macvlan, vlan, ipvlan, regular phy dev for linklocal and other ip addresses.
Most hardware will drop frame with mac-B-vlan-2 or other such combinations anyway.

> > > Additionally,
> > > when there is macvlan based slave device present on this vlan device, I
> will pass the ifindex of that particular netdev.
> > > Now since we don't have MAC address coming in ib_wc nor in IB/RoCEv2
> > > Annex spec, code needs to refer to the
> > > (a) ifaddr of the vlan netdev
> > > and
> > > (b) ifaddr of the slave netdevs
> > > Compare the DGID of the grh with ifaddr and use that netdev's ifindex for
> the first matching entry.
> > >
> > > Sounds reasonable now?
> > >
> >
> > Since we don't get the DMAC address, I think the GID cache shouldn't
> > carry entries which the hardware can't differentiate upon.
> 
> Well, more specifically, with this limiatation, the hardware must
> *NEVER* receive a packet that does not match the primary MAC of the port.
> 
> Which goes back to my first point: The hardware should not receive
> something that is not in the GID cache, period. It sounds like this basic sanity
> is being viloated in some current rocee hardware???
> 
> If any scenario makes the GID cache ambiguous then it cannot be allowed. eg
> apparently macvlan must be denied, which makes this pretty useless for
> namespaces.
> 
> From your comments, I think the hardware function is going to have to be
> improved to make this sane. I continue to recommend returning the GID
> cache index in the WC.
> 
> > It might be ok for some cases in the transmit side (as you choose the
> > smac based on the netdev attached to the GID entry, but if you add a
> > vxlan based interface, you won't be able to add the appropriate
> > headers). We can leave this as is or making it symmetrical.
> 
> Again, it is madness to allow the hardware to receive a packet on a UD QP
> that is not present in the GID table, and it is unworkable to have a WC that
> doesn't unambiguously refer to a GID Table entry.
> 
> So yes, things like vxlan should not be in the gid table if the hardware cannot
> cope with it.
> 
> > So, when adding a GID, we need to consult the hardware capabilities
> > regarding the metadata it can provide in the completion. If the
> > hardware isn't capable of creating/stripping one of the headers of
> > this netdev, there's no reason to add it.
> 
> Yes. This is also why long ago I suggested that the hardware driver should
> provide a function to resolve the WC into a GID cach entry and that function
> can rely on hardware unique capabilities.
> 
> IMHO userspace should not be exposed to this and UD QPs should be locked
> by hardware to a single netdev worth of gid cache entries. Anything weaker
> invites exploitation when we talk about namespaces.
> 
> > If the hardware supports creating/stripping the required headers but
> > it doesn't support reporting them in the completion or all fields are
> > supported but there are conflicting entries, you could either consult
> > the ingress route before adding these GIDs or add them both and
> > consult the
> 
> No. Hardware must support all features: create/strip/report/per-QP filter
> before the GID cache can have an entry. No subsets can be permitted.
> 
> This probably means existing firmware/hardware/drivers cannot support
> macvlan and maybe others, but that is much better than trying to support it
> in an unsafe and insecure way.
> 
> That probbably answers Parav's earlier question about duplicates in the gid
> table: It is a bug today that can even happen.
> 
> Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux