On Fri, Sep 18, 2020 at 02:59:28PM +0300, Oded Gabbay wrote: > On Fri, Sep 18, 2020 at 2:56 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Fri, Sep 18, 2020 at 02:36:10PM +0300, Gal Pressman wrote: > > > On 17/09/2020 20:18, Jason Gunthorpe wrote: > > > > On Tue, Sep 15, 2020 at 11:46:58PM +0300, Oded Gabbay wrote: > > > >> infrastructure for communication between multiple accelerators. Same > > > >> as Nvidia uses NVlink, we use RDMA that we have inside our ASIC. > > > >> The RDMA implementation we did does NOT support some basic RDMA > > > >> IBverbs (such as MR and PD) and therefore, we can't use the rdma-core > > > >> library or to connect to the rdma infrastructure in the kernel. > > > > > > > > You can't create a parallel RDMA subsystem in netdev, or in misc, and > > > > you can't add random device offloads as IOCTL to nedevs. > > > > > > > > RDMA is the proper home for all the networking offloads that don't fit > > > > into netdev. > > > > > > > > EFA was able to fit into rdma-core/etc and it isn't even RoCE at > > > > all. I'm sure this can too. > > > > > > Well, EFA wasn't welcomed to the RDMA subsystem with open arms ;), initially it > > > was suggested to go through the vfio subsystem instead. > > > > > > I think this comes back to the discussion we had when EFA was upstreamed, which > > > is what's the bar to get accepted to the RDMA subsystem. > > > IIRC, what we eventually agreed on is having a userspace rdma-core provider and > > > ibv_{ud,rc}_pingpong working (or just supporting one of the IB spec's QP types?). > > > > That is more or less where we ended up, yes. > > > > I'm most worried about this lack of PD and MR. > > > > Kernel must provide security for apps doing user DMA, PD and MR do > > this. If the device doesn't have PD/MR then it is hard to see how a WQ > > could ever be exposed directly to userspace, regardless of subsystem. > > Hi Jason, > What you say here is very true and we handle that with different > mechanisms. I will start working on a dedicated patch-set of the RDMA > code in the next few weeks with MUCH MORE details in the commit > messages. That will explain exactly how we expose stuff and protect. > > For example, regarding isolating between applications, we only support > a single application opening our file descriptor. Then the driver has a special PD create that requires the misc file descriptor to authorize RDMA access to the resources in that security context. > Another example is that the submission of WQ is done through our QMAN > mechanism and is NOT mapped to userspace (due to the restrictions you > mentioned above and other restrictions). Sure, other RDMA drivers also require a kernel ioctl for command execution. In this model the MR can be a software construct, again representing a security authorization: - A 'full process' MR, in which case the kernel command excution handles dma map and pinning at command execution time - A 'normal' MR, in which case the DMA list is pre-created and the command execution just re-uses this data The general requirement for RDMA is the same as DRM, you must provide enough code in rdma-core to show how the device works, and minimally test it. EFA uses ibv_ud_pingpong, and some pyverbs tests IIRC. So you'll want to arrange something where the default MR and PD mechanisms do something workable on this device, like auto-open the misc FD when building the PD, and support the 'normal' MR flow for command execution. Jason