On Mon, Jan 07, 2019 at 11:56:02PM +0000, Barrett, Brian wrote: > On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote: > >>>> I haven't see the libfabric provider yet, but libfabric has generic out-of- > >>> band socket-based name service that can be used by provider > I'm guessing > >>> that's what Gal is referring to. The name service is > >>> primarily there to support fabtests. > >>>> In realistic use cases, those providers rely on a job manager to exchange > >>> addressing, with name service support disabled. > >>> > >>> I think that this is what I was referring to by introducing efacm like > >>> ibcm and iwcm... Isn't it in essence the same thing? > >> > >> Not quite - this isn't running a connection protocol. The closest > >> in tree comparison would be the IB SIDR protocol used in conjunction > >> with IP addresses. I’m not aware of anyone using that, however. > >> Unconnected endpoints typically have an existing out of band > >> mechanism (e.g. PMI) that can be used for address exchange. The > >> PSM/2 drivers make a similar assumption. > > > > Dare I ask how it avoids duplicate messages without a connection > > protocol? > > In SRD’s case, there is a connection-like structure between any two > NICs that is dynamically established as part of packet transmission. > If you look at Sandia Portals (which is even further from standard > VERBS, but is a well documented communication interface so worth > referencing), it assumes a job configuration step that, while not > establishing a connection in the VERBS sense of the word connection, > does give a time period for which reliability data can be stored. Usually the reason a protocol needs an explicit exchange of connection parameters is to solve collisions with ID re-use, ie the source ID matching the 'connection-like' structure gets improperly re-used due to machine reboot, general ID recycling, or whatever. Does SRD inherently rely on the job-like scheme for correct operation? A mandatory job-like scheme would probably preclude using it directly in kernel ULPs in future.. Jason