On 06/14/2015 03:16 PM, Liran Liss wrote: >> From: Doug Ledford [mailto:dledford@xxxxxxxxxx] > >>> But the node_type stands for more than just an abstract RDMA device: >>> In IB, it designates an instance of an industry-standard, well-defined, >> device type: it's possible link types, transport, semantics, management, >> everything. >>> It *should* be exposed to user-space so apps that know and care what >> they are running on could continue to work. >> >> I'm sorry, but your argument here is not very convincing at all. And >> it's somewhat hypocritical. When RoCE was first introduced, the *exact* >> same argument could be used to argue for why RoCE should require a new >> node_type. Except then, because RoCE was your own, you argued for, and >> got, an expansion of the IB node_type definition that now included a >> relevant link_layer attribute that apps never needed to care about >> before. However, now you are a victim of your own success. You set the >> standard then that if the new device can properly emulate an IB Verbs/IB >> Link Layer device in terms of A) supported primitives (iWARP and usNIC >> both fail here, and hence why they have their own node_types) and B) >> queue pair creation process modulo link layer specific addressing >> attributes, then that device qualifies to use the IB_CA node_type and >> merely needs only a link_layer attribute to differentiate it. >> > > No. RoCE is as an open standard from the IBTA with the exact same RDMA protocol semantics as InfiniBand and a clear set of compliancy rules without which an implementation can't claim to be such. A RoCE device *is* an IB CA with an Ethernet link. > In contrast, OPA is a proprietary protocol. We don't know what primitives are supported, and whether the semantics of supported primitives are the same as in InfiniBand. Intel has stated on this list that they intend for RDMA apps to run on OPA transparently. That pretty much implies the list of primitives and everything else that they must support. However, time will tell if they succeeded or not. >> The new OPA stuff appears to be following *exactly* the same development >> model/path that RoCE did. When RoCE was introduced, all the apps that >> really cared about low level addressing on the link layer had to be >> modified to encompass the new link type. This is simply link_layer >> number three for apps to care about. >> > > You are missing my point. API transparency is not a synonym for full semantic equivalence. The Node Type doesn’t indicate level of adherence to an API. Node Type indicates compliancy to a specification (e.g. wire protocol, remote order of execution, error semantics, architectural limitations, etc). The IBTA CA and Switch Node Types belong to devices that are compliant to the corresponding specifications from the InfiniBand Trade Association. And that doesn’t prevent applications to choose to be coded to run over nodes of different Node Type as it happens today with IB/RoCE and iWARP. > > This has nothing to do with addressing. And whether you like it or not, Intel is intentionally creating a device/fabric with the specific intention of mimicking the IB_CA device type (with stated exceptions for MAD packets and addresses). They obviously won't have certification as an IB_CA, but that's not their aim. Their aim is to be a functional drop in replacement that apps don't need to know about except for the stated exceptions. And I'm not missing your point. Your point is inappropriate. You're trying to conflate certification with a functional API. The IB_CA node type is not an official certification of anything, and the linux kernel is not an official certifying body for anything. If you want certification, you go to the OFA and the UNH-IOL testing program. There, you have the rights to the certification branding logo and you have the right to deny access to that logo to anyone that doesn't meet the branding requirements. You're right that apps can be coded to other CA types, like RNICs and USNICs. However, those are all very different from an IB_CA due to limited queue pair types or limited primitives. If OPA had that same limitation then I would agree it needs a different node type. So this will be my litmus test. Currently, an app that supports all of the RDMA types looks like this: if (node_type == RNIC) do iwarpy stuff else if (node_type == USNIC) do USNIC stuff else if (node_type == IB_CA) do IB verbs stuff if (link_layer == Ethernet) do RoCE addressing/management else do IB addressing/management If, in the end, apps that are modified to support OPA end up looking like this: if (node_type == RNIC) do iwarpy stuff else if (node_type == USNIC) do USNIC stuff else if (node_type == IB_CA || node_type == OPA_CA) do IB verbs stuff if (node_type == OPA_CA) do OPA addressing/management else if (link_layer == Ethernet) do RoCE addressing/management else do IB addressing/management where you can plainly see that the exact same goal can be accomplished whether you have an OPA node_type or an IB_CA node_type + OPA link_layer, then I will be fine with either a new node_type or a new link_layer. They will be functionally equivalent as far as I'm concerned. -- Doug Ledford <dledford@xxxxxxxxxx> GPG KeyID: 0E572FDD
Attachment:
signature.asc
Description: OpenPGP digital signature