Re: [RFC rdma-core 2/2] verbs: Introduce non-contiguous memory registration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30/01/2018 17:42, Jason Gunthorpe wrote:
> On Tue, Jan 30, 2018 at 01:35:21PM +0200, Marcel Apfelbaum wrote:
>> On 29/01/2018 19:27, Jason Gunthorpe wrote:
>>> On Sun, Jan 28, 2018 at 10:37:47PM +0200, Yuval Shaia wrote:
>>>
>>
>> Hi Jason,
>>
>>>> But let's try to take it one step further, what if all my buffers are the
>>>> same size, of even better, all are PAGE_SIZE. So in case of "composite"
>>>> array of let's say 262144 elements i would have wasteful 262144 * 8 bytes.
>>>>
>>>> This problem could be solved with a bitmap to a given range where only the
>>>> bits that are set composed the MR.
>>>
>>> You want this for the host on virtualization right?
>>
>> Yes. (actually is more about us needing rather that wanting :) )
>>
>>> Like we talked
>>> about at plumbers?
>>>> Is it really necessary to be so optimal? A list of SGLs is not good
>>> enough?
>>
>> It is not. We think the list would need to be limited to a single page,
>> (system calls limitation? maybe we are wrong?)
> 
> The new ioctl interface isn't really limited.
> This new API(s) will run over ioctl.
> 

It is good to know, but still, passing so much information to kernel
when we can rather "compress" it, maybe it worth a second thought.


>> By the way, doing that would only solve half of our problem.
> 
> Well, actually, only a 3rd :| The new MR would likely be 0 based, but
> the VM guest doesn't know about this. So you'd need an API that can do
> arbitrary based to really solve your probably. I guess all HW should
> be able to do this so maybe it is OK?

The way we solve "the other" half is by intercepting the post-send
requests in hypervisor. At hypervisor level we don't have contiguous virtual
addresses anymore, but we don't need them for 0 based MRs:
The guest still register regular MRs, while the hypervisor will
register a 0 based MR save the guest virtual address of the MR.
At post-send we simply substract the saved MR base address from the work request
buffers and we are back to 0 based MR.

> 
>> The other problem is what is happening on post-send. We don't have a
>> virtually contiguous range to pass to post-send, and breaking the
>> Work Request into several work requests using pages as boundaries
>> will become again a problem if we want to send a big chunk (the HW
>> has a rather limited max sg elements).  We can solve it by using 0
>> based MRs, do you know if the current HW supports it?
> 
> I think some does.
> 

Do you have a model in mind? We would really want to try it out.

By the way, I tried to search in the kernel for vendors implementing it
and I saw maybe one vendor... so maybe 0 based MR is a nice idea but nothing more.

Thanks,
Marcel

> Jason
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux