On Fri, Mar 19, 2021 at 02:49:29PM +0000, Wan, Kaike wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxx> > > Sent: Friday, March 19, 2021 9:53 AM > > To: Wan, Kaike <kaike.wan@xxxxxxxxx> > > Cc: dledford@xxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx; Rimmer, Todd > > <todd.rimmer@xxxxxxxxx> > > Subject: Re: [PATCH RFC 0/9] A rendezvous module > > > > On Fri, Mar 19, 2021 at 08:56:26AM -0400, kaike.wan@xxxxxxxxx wrote: > > > > > - Basic mode of operations (PSM3 is used as an example for user > > > applications): > > > - A middleware (like MPI) has out-of-band communication channels > > > between any two nodes, which are used to establish high performance > > > communications for providers such as PSM3. > > > > Huh? Doesn't PSM3 already use it's own special non-verbs char devices that > > already have memory caches and other stuff? Now you want to throw that > > all away and do yet another char dev just for HFI? Why? > [Wan, Kaike] I think that you are referring to PSM2, which uses the > OPA hfi1 driver that is specific to the OPA hardware. PSM3 uses > standard verbs drivers and supports standard RoCE. Uhhh.. "PSM" has always been about the ipath special char device, and if I recall properly the library was semi-discontinued and merged into libfabric. So here you are talking about a libfabric verbs provider that doesn't use the ipath style char interface but uses verbs and this rv thing so we call it a libfabric PSM3 provider because thats not confusing to anyone at all.. > A focus is the Intel RDMA Ethernet NICs. As such it cannot use the > hfi1 driver through the special PSM2 interface. These are the drivers that aren't merged yet, I see. So why are you sending this now? I'm not interested to look at even more Intel code when their driver saga is still ongoing for years. > Rather it works with the hfi1 driver through standard verbs > interface. But nobody would do that right? You'd get better results using the hif1 native interfaces instead of their slow fake verbs stuff. > > I also don't know why you picked the name rv, this looks like it has little to do > > with the usual MPI rendezvous protocol. This is all about bulk transfers. It is > > actually a lot like RDS. Maybe you should be using RDS? > [Wan, Kaike] While there are similarities in concepts, details are > different. You should list these differences. > Quite frankly this could be viewed as an application accelerator > much like RDS served that purpose for Oracle, which continues to be > its main use case. Obviously, except it seems to be doing the same basic acceleration technique as RDS. > The name "rv" is chosen simply because this module is designed to > enable the rendezvous protocol of the MPI/OFI/PSM3 application stack > for large messages. Short messages are handled by eager transfer > through UDP in PSM3. A bad name seems like it will further limit potential re-use of this code. Jason