On Thu, Feb 01, 2018 at 08:22:01PM +0200, Marcel Apfelbaum wrote: > On 31/01/2018 20:38, Jason Gunthorpe wrote: > > On Wed, Jan 31, 2018 at 02:27:01PM +0200, Marcel Apfelbaum wrote: > > > >> It is good to know, but still, passing so much information to kernel > >> when we can rather "compress" it, maybe it worth a second thought. > > > > Not sure. Have to see the whole thing.. > > > >>> Well, actually, only a 3rd :| The new MR would likely be 0 based, but > >>> the VM guest doesn't know about this. So you'd need an API that can do > >>> arbitrary based to really solve your probably. I guess all HW should > >>> be able to do this so maybe it is OK? > >> > >> The way we solve "the other" half is by intercepting the post-send > >> requests in hypervisor. At hypervisor level we don't have contiguous virtual > >> addresses anymore, but we don't need them for 0 based MRs: > >> The guest still register regular MRs, while the hypervisor will > >> register a 0 based MR save the guest virtual address of the MR. > >> At post-send we simply substract the saved MR base address from the work request > >> buffers and we are back to 0 based MR. > > > > Hi Jason, > > > That only works for lkeys, the rkey expoeses the base address to the > > remote - the HV can't fix it.. > > > > Thanks for the clarification. > > What we really need is to allow to map a list of > pages to a IOVA different from the process address > space, e.g guest supplied IOVA. > > Something like req_mr (list_of_process_va_pages, base_other_iova, len_other_iova) > > Do think the new API can support that? Well, I think we should have something like this. I actually can't see how it could need special HW support, since this is basically exactly the same as creating a normal MR. And same with 0 based, 'base_other_iova == 0' is the same as zero based. I think the difference from the proposed API here is this requires full OS pages, while Alex's version can do sub-pages too using HW features. I would urge you to persue an API like you described: struct ibv_mr *ib_reg_mr_sg(const void *pages[], size_t num_pages, uint64_t mr_addr, size_t mr_offset, // MR starts at pages[0] + mr)offset size_t mr_length, unsigned int flags); Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html