RE: [RFC PATCH 1/5] IB/core: Add Core Capability flags to ib_device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> I thought USNIC_UDP had an embedded USNIC protocol header inside the UDP
> header.  That would make it a UDP_ENCAP protocol.

Someone from Cisco can correct me, but USNIC supports 2 protocols.  Just plain UDP, and a proprietary protocol that runs over Ethernet, but uses the same EtherType as RoCE.  I thought these could both be active on the same port at the same time.

> > RoCEv2 is IB transport over UDP.
> 
> Right, ROCE (or IB, whichever you prefer) encapsulated in UDP.
> 
> > I'm not sure what the protocol field is intended to imply.
> 
> There is still information in those bits that we can't get elsewhere.
> For instance, even though this patch replaces the CAP_* stuff with bits,
> if you took away the CAP_PROT_* entries, then there would be no entry to
> identify USNIC at all.
> 
> Right now, you could infer iWARP from CAP_IW_CM.
> You could infer InfiniBand from any of the CAP_IB_* (but later will need
> a way to differentiate between IB and OPA)
> You could infer ROCE from CAP_ETH_AH (but later will need a way to
> differentiate between ROCE and ROCEv2)
> The only way to differentiate USNIC at the moment, is that the CAPS
> would be all 0.  That's not the sort of positive identification I would
> prefer.
> 
> So you *could* reduce this to just one bit for USNIC.
> 
> And if you then add a UDP_ENCAP bit, then that single bit can do double
> duty in telling apart USNIC and USNIC_UDP and ROCE and ROCEv2.

My question is who needs these bits and why?  The primary reason it was exposed was to do the job that the new cap flags are accomplishing.

I still believe that RoCEv2 is conceptually the same as iWarp.  An RDMA protocol has been layered over some other transport.  In the case of iWarp, it's TCP.  In the case of RoCEv2, it's UDP.  Do we define a TCP_ENCAP bit that corresponds with UDP_ENCAP?  And why should an app care?  We don't specify whether the port is running IPv4 or IPv6.  Why is the transport level called out, but not the network layer?  Architecturally, neither iWarp or RoCEv2 (despite its name) cares what the link layer is.

> >   And the core layer
> > should not assume that a device is limited to supporting only one
> > protocol, especially at the network and transport levels.
> 
> Given that this is a per port thing, there is no assumption about a
> device only supporting a single protocol.

Device, no, but we are assuming this per port.  I don't think this is true for USNIC.  For that matter, it's entirely possible for a RoCEv2 device to expose UDP directly to user space, same as USNIC.  (I'd actually be surprised if no devices have this capability, if for debugging capabilities, even if for nothing else.)  What are we going to do if there's a device that supports both iWarp and RoCEv2?  That's easily doable today through software.

Basically, I question how protocol is defined and what it means to expose it as a device attribute.  Should it instead be negotiated (i.e. more like sockets)?  If an app needs a specific "RDMA protocol" (IB or iWarp) or "application protocol" (IB or iWarp or UDP or MADs?), they can request it.  Otherwise, the app gets assigned some protocol.  And the protocol should really be associated with a QP, rather than a port.

- Sean 
��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux