On Fri, Sep 18, 2020 at 04:49:25PM +0300, Oded Gabbay wrote: > On Fri, Sep 18, 2020 at 4:26 PM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote: > > > > > The problem with MR is that the API doesn't let us return a new VA. It > > > forces us to use the original VA that the Host OS allocated. > > > > If using the common MR API you'd have to assign a unique linear range > > in the single device address map and record both the IOVA and the MMU > > VA in the kernel struct. > > > > Then when submitting work using that MR lkey the kernel will adjust > > the work VA using the equation (WORK_VA - IOVA) + MMU_VA before > > forwarding to HW. > > > We can't do that. That will kill the performance. If for every > submission I need to modify the packet's contents, the throughput will > go downhill. You clearly didn't read where I explained there is a fast path and slow path expectation. > Also, submissions to our RDMA qmans are coupled with submissions to > our DMA/Compute QMANs. We can't separate those to different API calls. > That will also kill performance and in addition, will prevent us from > synchronizing all the engines. Not sure I see why this is a problem. I already explained the fast device specific path. As long as the kernel maintains proper security when it processes submissions the driver can allow objects to cross between the two domains. Jason