RE: Need to set if_index in ib_init_ah_from_wc() ?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jason,

> -----Original Message-----
> From: Jason Gunthorpe [mailto:jgunthorpe@xxxxxxxxxxxxxxxxxxxx]
> Sent: Monday, January 30, 2017 4:15 PM
> To: Parav Pandit <parav@xxxxxxxxxxxx>
> Cc: Roland Dreier <roland@xxxxxxxxxxxxxxx>; linux-rdma@xxxxxxxxxxxxxxx;
> Matan Barak <matanb@xxxxxxxxxxxx>
> Subject: Re: Need to set if_index in ib_init_ah_from_wc() ?
> 
> On Mon, Jan 30, 2017 at 09:52:17PM +0000, Parav Pandit wrote:
> 
> > What I meant is, UD QP is not attached to netdev but yes with my
> > patches any QP is attached to namespace, just like regular socket.
> > So I have qp_net inside ib_qp structure.  Let me finish internal peer
> > reviews before I post the patches.
> 
> No, regular sockets can be attached to both - '*' bind makes no sense for
> RDMA because such a socket will not rx for all ROCEE addreses, just the ones
> on the device it is bound to.
> 
> For that reason the entire concept of '*' bind should not even exist.

I am talking of binding QP or socket to namespace.
My understanding is a given socket can be bound to one and only one network namespace at a time.
If that's not true, socket should have array for net.
Refer to sock_net_set() and sock_net() APIs.
Which namespace it is bound to, its obvious reason not known to the user, regardless of bind(* or specific address).

> 
> > > We certainly can't have namespaces with such an insecure design. UD
> > > QPs must not receive packets inapproprate for the netdev they are
> > > associated with, or the collection of netdevs in their namespace
> > > (for * binds)
> >
> > Haggai and I had exact same discussion few days back when I referred
> > the IB spec. 1.3.1 section C9-46.  So I agree for netdev (mac, vlan).
> 
> ipvlan is a thing too..
Yes. Will think about this further. 
> 
> > But IP could be of other namespace in incoming packet and that should
> > be ok because we don't have UDP socket style bind() on UD QPs.
> 
> That isn't OK from a security standpoint.
> 
> You can't secure UD QPs by using the QPN, the NIC has to restrict incoming
> packets to only match certain GID table entries, and those entries must be
> the netdev the UD QP is bound to or all netdevices in the UD QPs's
> namespace (for '*' bind)
> 
> Otherwise the NIC does not support rocee namespaces.
> 
o.k. Just to highlight you, QP1 is exception to this.

> > > Well, using idev isn't going to work with child netdevs, etc
> 
> > I have further added checks for vlan_dev in presence of wc vlan id
> > present and walking upto upper device.
> 
> That sounds broken, there should never be walking of device or anything silly
> like that to determine the ingress netdev. The only place that is done is when
> constructing the GID cache.
> 
GID table can have two gid entries with same GID content in there, but gid_attr->net can be different.
This is typically for vlan based devices located in two different namespace.
This is the fundamental point to query GID cache based on netdev's net.
Without considering net_ns, GID cache query is equally broken.
Therefore incoming packet's netdev and net namespace derivation is needed from ingress packet.

> > > The best thing to do is introduce required netdev binding for UD QPs
> > > in kernel and then the kernel's ib_init_ah_from_wc can work safely..
> > >
> > I am binding to net_ns of the calling process which can have one or
> > more netdev.
> 
> But that doesn't make sense for in-kernel users, and those are the only users
> that can call ib_init_ah_from_wc...

Its done by default even for sockets. That's how its established that it belongs to init_net.
Refer to sk_alloc().

> 
> > This is the right thing to do similar to regular sockets.
> 
> Only if the sockets are '*' bound,..
> 
I think you mean sockets are '*' bound to address.
Assuming yes, to it, regardless of address property they will be bound to net_ns.
This covers the case of applications who are not using RDMA CM as well.

> > Exception if QP1 which will receive packets for all namespaces and
> > will ingress and egress lookups based on the git_addr->net.
> 
> But even here you need to get the ingress netdevice from the gid cache and
> use that to deduce what namespace the packet is for.
> 
As mentioned above, GID table can have duplicate entries so namespace is identified by the ingress path.
I have reviewed net code before, but I will refer again how it's done on regular Ethernet packets.

> > > ifindex do not quickly re-use so this is fine.
> 
> > Its not quickly reused but when netdevice moves to different
> > namespace, if that ifindex is already exist there, it assigns new one.
> > But I agree it's pretty corner case and its likely an administrative
> > limitation until we establish good use case.
> 
> Oh? But in that case an (ifindex,ns) is still a stable reference.
> 
> > > Sort of, but you can't do the route lookup until you get the netdev,
> > > and you can't get a netdev without a gid table lookup. :\
> > >
> > wc->vlan establishes netdev and its ifindex.
> 
> Nope, that isn't enough to support all the behaviors.
> 
> > If upper device exist, netdev is established through that in addition
> > to vlan.  netdev establishes namespace.  Namespace establishes gid
> > table search scope and therefore right sgid index.
> > Same namespace will be used for destination route table lookup as you
> > described in steps above.
> 
> No way, that is backwards.
> 
> The ingress netdev comes from searching the GID table, period. The GID
> table matches the vlan and dgid.
> 
> From there you have the netnamespace to do routing lookups, etc.
> 
> > I am glad that we are having this discussion right now.
> 
> Can you try and submit a patch to fix Roland's problem please?
> 
> Searching the gid table is the way to go for now...
> 

Let's first agree/disagree for duplicate GID table entries exist for two different namespaces.

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux