On Fri, Sep 18, 2020 at 04:02:24PM +0300, Oded Gabbay wrote: > The problem with MR is that the API doesn't let us return a new VA. It > forces us to use the original VA that the Host OS allocated. If using the common MR API you'd have to assign a unique linear range in the single device address map and record both the IOVA and the MMU VA in the kernel struct. Then when submitting work using that MR lkey the kernel will adjust the work VA using the equation (WORK_VA - IOVA) + MMU_VA before forwarding to HW. EFA doesn't support rkeys, so they are not required to be emulated. It would have to create rkeys using some guadidv_reg_mr_rkey() It is important to understand that the usual way we support these non-RDMA devices is to insist that they use SW to construct a minimal standards based RDMA API, and then allow the device to have a 'dv' API to access a faster, highly device specific, SW bypass path. So for instance you might have some guadidv_post_work(qp) that doesn't use lkeys and works directly on the MMU_VA. A guadidv_get_mmu_va(mr) would return the required HW VA from the kernel. Usually the higher level communication library (UCX, MPI, etc) forms the dv primitives into something application usable. > we do if that VA is in the range of our HBM addresses ? The device > won't be able to distinguish between them. The transaction that is > generated by an engine inside our device will go to the HBM instead of > going to the PCI controller and then to the host. > > That's the crust of the problem and why we didn't use MR. No, the problem with the device is that it doesn't have a lkey/rkey, so it is stuck with a single translation domain. RoCE compliant devices are required to have multiple translation domains - each lkey/rkey specifies a unique translation. The MR concept is a region of process VA mapped into the device for device access, and this device *clearly* has that. Jason