> > On Jan 7, 2019, at 16:29, Jason Gunthorpe <jgg@xxxxxxxx> wrote: > >> On Mon, Jan 07, 2019 at 11:56:02PM +0000, Barrett, Brian wrote: >>> On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: >>> >>> On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote: >>>>>> I haven't see the libfabric provider yet, but libfabric has generic out-of- >>>>> band socket-based name service that can be used by provider > I'm guessing >>>>> that's what Gal is referring to. The name service is >>>>> primarily there to support fabtests. >>>>>> In realistic use cases, those providers rely on a job manager to exchange >>>>> addressing, with name service support disabled. >>>>> >>>>> I think that this is what I was referring to by introducing efacm like >>>>> ibcm and iwcm... Isn't it in essence the same thing? >>>> >>>> Not quite - this isn't running a connection protocol. The closest >>>> in tree comparison would be the IB SIDR protocol used in conjunction >>>> with IP addresses. I’m not aware of anyone using that, however. >>>> Unconnected endpoints typically have an existing out of band >>>> mechanism (e.g. PMI) that can be used for address exchange. The >>>> PSM/2 drivers make a similar assumption. >>> >>> Dare I ask how it avoids duplicate messages without a connection >>> protocol? >> >> In SRD’s case, there is a connection-like structure between any two >> NICs that is dynamically established as part of packet transmission. >> If you look at Sandia Portals (which is even further from standard >> VERBS, but is a well documented communication interface so worth >> referencing), it assumes a job configuration step that, while not >> establishing a connection in the VERBS sense of the word connection, >> does give a time period for which reliability data can be stored. > > Usually the reason a protocol needs an explicit exchange of connection > parameters is to solve collisions with ID re-use, ie the source ID > matching the 'connection-like' structure gets improperly re-used due > to machine reboot, general ID recycling, or whatever. > > Does SRD inherently rely on the job-like scheme for correct operation? > > A mandatory job-like scheme would probably preclude using it directly > in kernel ULPs in future.. Sorry, that wasn’t clear. No, SRD does not require any job-like indicators. It has a protocol to establish / invalidate reliability state in firmware. My point was that whether or not there’s a connection established under the covers, there’s no visible connection to the user with SRD; the usage flow is similar to UD or RD (obviously with different reliability, ordering, and performance characteristics). My experience (perhaps incorrect, but matching with Gal’s expectations) with UD and RD is that consumers of the datagram protocols don’t use a connection manager (because there isn’t a connection). If this is a bad assumption, we’ll go back and rethink our strategy. Brian