Re: [PATCH RFC 0/2] IB device in-kernel API support indication

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 07, 2019 at 11:56:02PM +0000, Barrett, Brian wrote:
> On Jan 7, 2019, at 3:42 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > 
> > On Mon, Jan 07, 2019 at 04:28:54PM +0000, Hefty, Sean wrote:
> >>>> I haven't see the libfabric provider yet, but libfabric has generic out-of-
> >>> band socket-based name service that can be used by provider > I'm guessing
> >>> that's what Gal is referring to.  The name service is
> >>> primarily there to support fabtests.
> >>>> In realistic use cases, those providers rely on a job manager to exchange
> >>> addressing, with name service support disabled.
> >>> 
> >>> I think that this is what I was referring to by introducing efacm like
> >>> ibcm and iwcm... Isn't it in essence the same thing?
> >> 
> >> Not quite - this isn't running a connection protocol.  The closest
> >> in tree comparison would be the IB SIDR protocol used in conjunction
> >> with IP addresses.  I’m not aware of anyone using that, however.
> >> Unconnected endpoints typically have an existing out of band
> >> mechanism (e.g. PMI) that can be used for address exchange.  The
> >> PSM/2 drivers make a similar assumption.
> > 
> > Dare I ask how it avoids duplicate messages without a connection
> > protocol?
> 
> In SRD’s case, there is a connection-like structure between any two
> NICs that is dynamically established as part of packet transmission.
> If you look at Sandia Portals (which is even further from standard
> VERBS, but is a well documented communication interface so worth
> referencing), it assumes a job configuration step that, while not
> establishing a connection in the VERBS sense of the word connection,
> does give a time period for which reliability data can be stored.

Usually the reason a protocol needs an explicit exchange of connection
parameters is to solve collisions with ID re-use, ie the source ID
matching the 'connection-like' structure gets improperly re-used due
to machine reboot, general ID recycling, or whatever.

Does SRD inherently rely on the job-like scheme for correct operation?

A mandatory job-like scheme would probably preclude using it directly
in kernel ULPs in future..

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux