RE: [PATCH 14/14] IB/mad: Add final OPA MAD processing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> From: Doug Ledford [mailto:dledford@xxxxxxxxxx]

> > No. RoCE is as an open standard from the IBTA with the exact same RDMA
> protocol semantics as InfiniBand and a clear set of compliancy rules without
> which an implementation can't claim to be such. A RoCE device *is* an IB CA
> with an Ethernet link.
> > In contrast, OPA is a proprietary protocol. We don't know what primitives
> are supported, and whether the semantics of supported primitives are the
> same as in InfiniBand.
> 
> Intel has stated on this list that they intend for RDMA apps to run on
> OPA transparently.  That pretty much implies the list of primitives and
> everything else that they must support.  However, time will tell if they
> succeeded or not.
> 

I am sorry, but that's not good enough.
When I see an IB device, I know exactly what to expect. I can't say anything regarding an OPA device.

It might be that today the semantics are "close enough".
But in the future, both feature sets and semantics may diverge considerably.
What are you going to do then?

In addition, today, the host admin knows that 2 IB CA nodes will always interoperate. If you share the node type with OPA, everything breaks down. There is no way of knowing which devices work with which.

> >> The new OPA stuff appears to be following *exactly* the same
> development
> >> model/path that RoCE did.  When RoCE was introduced, all the apps that
> >> really cared about low level addressing on the link layer had to be
> >> modified to encompass the new link type.  This is simply link_layer
> >> number three for apps to care about.
> >>
> >
> > You are missing my point. API transparency is not a synonym for full
> semantic equivalence.  The Node Type doesn’t indicate level of adherence to
> an API. Node Type indicates compliancy to a  specification (e.g. wire protocol,
> remote order of execution, error semantics, architectural limitations, etc).
> The IBTA CA and Switch Node Types belong to devices that are compliant to
> the corresponding specifications from the InfiniBand Trade Association.  And
> that doesn’t prevent applications to choose to be coded to run over nodes of
> different Node Type as it happens today with IB/RoCE and iWARP.
> >
> > This has nothing to do with addressing.
> 
> And whether you like it or not, Intel is intentionally creating a
> device/fabric with the specific intention of mimicking the IB_CA device
> type (with stated exceptions for MAD packets and addresses).  They
> obviously won't have certification as an IB_CA, but that's not their
> aim.  Their aim is to be a functional drop in replacement that apps
> don't need to know about except for the stated exceptions.
> 

Intensions are nice, but there is no way to define these "stated exceptions" apart from a specification.

> And I'm not missing your point.  Your point is inappropriate.  You're
> trying to conflate certification with a functional API.  The IB_CA node
> type is not an official certification of anything, and the linux kernel
> is not an official certifying body for anything.  If you want
> certification, you go to the OFA and the UNH-IOL testing program.
> There, you have the rights to the certification branding logo and you
> have the right to deny access to that logo to anyone that doesn't meet
> the branding requirements.

Who said anything about certification?
I am talking about present and future semantic compliance to what an IB CA stands for, and interoperability guarantees.

ib_verbs define an *extensive* direct HW access API, which is constantly evolving.
You cannot describe the intricate object relations and semantics through an API.
In addition, you can't abstract anything or fix stuff in SW.
The only way to *truly* know what to expect when performing Verbs calls is to check the node type.

ib_verbs was never only an API. It started as the Linux implementation of the IBTA standard, with guaranteed semantics and wire protocol.
Later, the interface was reused to support additional RDMA devices. However, you could *always* check the node type if you wanted to, thereby retaining the standard guarantees. Win-win situation...

This is a very strong property; we should not give up on it.

> 
> You're right that apps can be coded to other CA types, like RNICs and
> USNICs.  However, those are all very different from an IB_CA due to
> limited queue pair types or limited primitives.  If OPA had that same
> limitation then I would agree it needs a different node type.
> 

How do you know that it doesn't?
Have you seen the OPA specification?

> So this will be my litmus test.  Currently, an app that supports all of
> the RDMA types looks like this:
> 
> if (node_type == RNIC)
> 	do iwarpy stuff
> else if (node_type == USNIC)
> 	do USNIC stuff
> else if (node_type == IB_CA)
> 	do IB verbs stuff
> 	if (link_layer == Ethernet)
> 		do RoCE addressing/management
> 	else
> 		do IB addressing/management
> 
> 
> 
> If, in the end, apps that are modified to support OPA end up looking
> like this:
> 
> if (node_type == RNIC)
> 	do iwarpy stuff
> else if (node_type == USNIC)
> 	do USNIC stuff
> else if (node_type == IB_CA || node_type == OPA_CA)
> 	do IB verbs stuff
> 	if (node_type == OPA_CA)
> 		do OPA addressing/management
> 	else if (link_layer == Ethernet)
> 		do RoCE addressing/management
> 	else
> 		do IB addressing/management
> 
> where you can plainly see that the exact same goal can be accomplished
> whether you have an OPA node_type or an IB_CA node_type + OPA
> link_layer, then I will be fine with either a new node_type or a new
> link_layer.  They will be functionally equivalent as far as I'm concerned.
> 

It is true that for some applications, your abstraction might work transparently.
But for other applications, your "do IB verbs stuff" (and not just the addressing/management) will either break today or break tomorrow.

This is bad both for IB and for OPA.

Why on earth are we putting ourselves into a position which could easily be avoided in the first place?

The solution is simple:
- As an API, Verbs will support IB/ROCE, iWARP, USNIC, and OPA
- The node type and link type refer to specific technologies
-- Most applications indeed don't care and don't check either of these properties anyway
-- Those that do, do it for a good reason; don't break them
- Management helpers will do a good job to keep the code maintainable and efficient even if OPA and IB have different node types

Win-win situation...
--Liran



��.n��������+%������w��{.n�����{���fk��ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux