RE: [PATCH RFC 0/2] IB device in-kernel API support indication

"Shalev, Leah" <shalevl@xxxxxxxxxx> · Tue, 8 Jan 2019 09:17:53 +0000

> From: Jason Gunthorpe <jgg@xxxxxxxx>
> 
> But still, the hidden and uncontrolled resource usage is probably still not so
> great for anything but a job-like HPC application. Any client/server thing is
> going to want to control this resource more finely.
> 

Here is an excerpt from "SRD spec" we will provide, hope it will clarify things:

SRD QPs provide reliable but out-of-order delivery without segmentation support.
This allows decoupling of transport processing from QP buffer management, so
that separate application flows can be multiplexed without interfering with each
other.
As in UD QPs, each WR includes the AH of the remote destination, allowing a 
process to communicate with any process on any endnode using a single QP, on 
both send and receive side. Each Address Handle is associated with an SRD 
context. SRD context is used to provide reliable communication to a remote node,
similar to RD EE context, but without explicit management by a user. SRD 
contexts are implicitly controlled by AH and QP management operations. If a QP 
is destroyed, all pending Send WRs on that QP are implicitly canceled, and their transport
processing is aborted, without affecting SRD processing of other WRs. If an AH 
is destroyed, any outstanding WRs using that AH are completed in error.

Completion for Send WRs posted to SRD QPs are same as for WRs posted to regular
QPs. Success is reported after the WR is acked by the responder. 
In addition to local errors, new types of remote errors are returned for 
requests that caused the responder to send a NAK. These errors could have been 
caused when the destination QP either does not exist, or is in error state, or 
does not have posted Recv WRs. These errors do not affect SRD context state.

> 
> UD has both connected and unconnected flows that are interesting, and as
> soon as there is a resource and state, generally, people will eventually find a
> reason to need control over that. (although probably not from a HPC
> workload perspective)
We can support CM (UD-style) if anybody ever needs it, but it would be used only 
to control QP states, not SRD transport state.

> 
> For instance, most enterprise applications will want to tear down and restart
> their 'connection' - in the SRD perspective this means forgetting about all the
> connection state and setting it up again.
This is how it is today because of tight coupling of the interface and underlying protocol,
it does not have to be achieved by connection tear down.

> 
> In typical cases for other protocols this might select a different network
> multi-path, or side step some bug that was preventing forward progress.
Which is exactly why we  chose to design a new protocol.

> 
> So, you can choose to hide all of this, but I wouldn't describe SRD as
> unconnected, more as 'automatically connected'.
Is there any difference from a user perspective?

Leah