On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote: >>>> I haven't see the libfabric provider yet, but libfabric has generic out-of- >>> band socket-based name service that can be used by provider > I'm guessing >>> that's what Gal is referring to. The name service is >>> primarily there to support fabtests. >>>> In realistic use cases, those providers rely on a job manager to exchange >>> addressing, with name service support disabled. >>> >>> I think that this is what I was referring to by introducing efacm like >>> ibcm and iwcm... Isn't it in essence the same thing? >> >> Not quite - this isn't running a connection protocol. The closest >> in tree comparison would be the IB SIDR protocol used in conjunction >> with IP addresses. I’m not aware of anyone using that, however. >> Unconnected endpoints typically have an existing out of band >> mechanism (e.g. PMI) that can be used for address exchange. The >> PSM/2 drivers make a similar assumption. > > Dare I ask how it avoids duplicate messages without a connection > protocol? In SRD’s case, there is a connection-like structure between any two NICs that is dynamically established as part of packet transmission. If you look at Sandia Portals (which is even further from standard VERBS, but is a well documented communication interface so worth referencing), it assumes a job configuration step that, while not establishing a connection in the VERBS sense of the word connection, does give a time period for which reliability data can be stored. Brian