Re: [RFC rdma-core 2/2] verbs: Introduce non-contiguous memory registration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 01/02/2018 20:29, Jason Gunthorpe wrote:
> On Thu, Feb 01, 2018 at 08:22:01PM +0200, Marcel Apfelbaum wrote:
>> On 31/01/2018 20:38, Jason Gunthorpe wrote:
>>> On Wed, Jan 31, 2018 at 02:27:01PM +0200, Marcel Apfelbaum wrote:
>>>
>>>> It is good to know, but still, passing so much information to kernel
>>>> when we can rather "compress" it, maybe it worth a second thought.
>>>
>>> Not sure. Have to see the whole thing..
>>>
>>>>> Well, actually, only a 3rd :| The new MR would likely be 0 based, but
>>>>> the VM guest doesn't know about this. So you'd need an API that can do
>>>>> arbitrary based to really solve your probably. I guess all HW should
>>>>> be able to do this so maybe it is OK?
>>>>
>>>> The way we solve "the other" half is by intercepting the post-send
>>>> requests in hypervisor. At hypervisor level we don't have contiguous virtual
>>>> addresses anymore, but we don't need them for 0 based MRs:
>>>> The guest still register regular MRs, while the hypervisor will
>>>> register a 0 based MR save the guest virtual address of the MR.
>>>> At post-send we simply substract the saved MR base address from the work request
>>>> buffers and we are back to 0 based MR.
>>>
>>
>> Hi Jason,
>>
>>> That only works for lkeys, the rkey expoeses the base address to the
>>> remote - the HV can't fix it..
>>>
>>
>> Thanks for the clarification.
>>
>> What we really need is to allow to map a list of
>> pages to a IOVA different from the process address
>> space, e.g guest supplied IOVA.
>>
>> Something like req_mr (list_of_process_va_pages, base_other_iova, len_other_iova)
>>
>> Do think the new API can support that?
> 
> Well, I think we should have something like this.
> 
> I actually can't see how it could need special HW support, since this
> is basically exactly the same as creating a normal MR.
> 
> And same with 0 based, 'base_other_iova == 0' is the same as zero
> based.
> 

Agreed.

> I think the difference from the proposed API here is this requires
> full OS pages, while Alex's version can do sub-pages too using HW
> features.
> 
> I would urge you to persue an API like you described:
> 
> struct ibv_mr *ib_reg_mr_sg(const void *pages[], size_t num_pages,
>                             uint64_t mr_addr,
> 			    size_t mr_offset, // MR starts at pages[0] + mr)offset
> 			    size_t mr_length,
> 			    unsigned int flags);
> 

Sounds right, thanks for the pointer.
Marcel

> Jason
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux