Re: [RFC rdma-core 2/2] verbs: Introduce non-contiguous memory registration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 01, 2018 at 08:22:01PM +0200, Marcel Apfelbaum wrote:
> On 31/01/2018 20:38, Jason Gunthorpe wrote:
> > On Wed, Jan 31, 2018 at 02:27:01PM +0200, Marcel Apfelbaum wrote:
> > 
> >> It is good to know, but still, passing so much information to kernel
> >> when we can rather "compress" it, maybe it worth a second thought.
> > 
> > Not sure. Have to see the whole thing..
> > 
> >>> Well, actually, only a 3rd :| The new MR would likely be 0 based, but
> >>> the VM guest doesn't know about this. So you'd need an API that can do
> >>> arbitrary based to really solve your probably. I guess all HW should
> >>> be able to do this so maybe it is OK?
> >>
> >> The way we solve "the other" half is by intercepting the post-send
> >> requests in hypervisor. At hypervisor level we don't have contiguous virtual
> >> addresses anymore, but we don't need them for 0 based MRs:
> >> The guest still register regular MRs, while the hypervisor will
> >> register a 0 based MR save the guest virtual address of the MR.
> >> At post-send we simply substract the saved MR base address from the work request
> >> buffers and we are back to 0 based MR.
> > 
> 
> Hi Jason,
> 
> > That only works for lkeys, the rkey expoeses the base address to the
> > remote - the HV can't fix it..
> > 
> 
> Thanks for the clarification.
> 
> What we really need is to allow to map a list of
> pages to a IOVA different from the process address
> space, e.g guest supplied IOVA.
> 
> Something like req_mr (list_of_process_va_pages, base_other_iova, len_other_iova)
> 
> Do think the new API can support that?

Well, I think we should have something like this.

I actually can't see how it could need special HW support, since this
is basically exactly the same as creating a normal MR.

And same with 0 based, 'base_other_iova == 0' is the same as zero
based.

I think the difference from the proposed API here is this requires
full OS pages, while Alex's version can do sub-pages too using HW
features.

I would urge you to persue an API like you described:

struct ibv_mr *ib_reg_mr_sg(const void *pages[], size_t num_pages,
                            uint64_t mr_addr,
			    size_t mr_offset, // MR starts at pages[0] + mr)offset
			    size_t mr_length,
			    unsigned int flags);

Jason
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux