Re: [PATCH RFC 0/9] A rendezvous module

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 19, 2021 at 02:49:29PM +0000, Wan, Kaike wrote:
> > From: Jason Gunthorpe <jgg@xxxxxxxxxx>
> > Sent: Friday, March 19, 2021 9:53 AM
> > To: Wan, Kaike <kaike.wan@xxxxxxxxx>
> > Cc: dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Rimmer, Todd
> > <todd.rimmer@xxxxxxxxx>
> > Subject: Re: [PATCH RFC 0/9] A rendezvous module
> > 
> > On Fri, Mar 19, 2021 at 08:56:26AM -0400, kaike.wan@xxxxxxxxx wrote:
> > 
> > > - Basic mode of operations (PSM3 is used as an example for user
> > >   applications):
> > >   - A middleware (like MPI) has out-of-band communication channels
> > >     between any two nodes, which are used to establish high performance
> > >     communications for providers such as PSM3.
> > 
> > Huh? Doesn't PSM3 already use it's own special non-verbs char devices that
> > already have memory caches and other stuff? Now you want to throw that
> > all away and do yet another char dev just for HFI? Why?

> [Wan, Kaike] I think that you are referring to PSM2, which uses the
> OPA hfi1 driver that is specific to the OPA hardware.  PSM3 uses
> standard verbs drivers and supports standard RoCE.  

Uhhh.. "PSM" has always been about the ipath special char device, and
if I recall properly the library was semi-discontinued and merged into
libfabric.

So here you are talking about a libfabric verbs provider that doesn't
use the ipath style char interface but uses verbs and this rv thing so
we call it a libfabric PSM3 provider because thats not confusing to
anyone at all..

> A focus is the Intel RDMA Ethernet NICs. As such it cannot use the
> hfi1 driver through the special PSM2 interface. 

These are the drivers that aren't merged yet, I see. So why are you
sending this now? I'm not interested to look at even more Intel code
when their driver saga is still ongoing for years.

> Rather it works with the hfi1 driver through standard verbs
> interface.

But nobody would do that right? You'd get better results using the
hif1 native interfaces instead of their slow fake verbs stuff.

> > I also don't know why you picked the name rv, this looks like it has little to do
> > with the usual MPI rendezvous protocol. This is all about bulk transfers. It is
> > actually a lot like RDS. Maybe you should be using RDS?

> [Wan, Kaike] While there are similarities in concepts, details are
> different.  

You should list these differences.

> Quite frankly this could be viewed as an application accelerator
> much like RDS served that purpose for Oracle, which continues to be
> its main use case.

Obviously, except it seems to be doing the same basic acceleration
technique as RDS.

> The name "rv" is chosen simply because this module is designed to
> enable the rendezvous protocol of the MPI/OFI/PSM3 application stack
> for large messages. Short messages are handled by eager transfer
> through UDP in PSM3.

A bad name seems like it will further limit potential re-use of this
code.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux