On 30/01/2018 17:42, Jason Gunthorpe wrote: > On Tue, Jan 30, 2018 at 01:35:21PM +0200, Marcel Apfelbaum wrote: >> On 29/01/2018 19:27, Jason Gunthorpe wrote: >>> On Sun, Jan 28, 2018 at 10:37:47PM +0200, Yuval Shaia wrote: >>> >> >> Hi Jason, >> >>>> But let's try to take it one step further, what if all my buffers are the >>>> same size, of even better, all are PAGE_SIZE. So in case of "composite" >>>> array of let's say 262144 elements i would have wasteful 262144 * 8 bytes. >>>> >>>> This problem could be solved with a bitmap to a given range where only the >>>> bits that are set composed the MR. >>> >>> You want this for the host on virtualization right? >> >> Yes. (actually is more about us needing rather that wanting :) ) >> >>> Like we talked >>> about at plumbers? >>>> Is it really necessary to be so optimal? A list of SGLs is not good >>> enough? >> >> It is not. We think the list would need to be limited to a single page, >> (system calls limitation? maybe we are wrong?) > > The new ioctl interface isn't really limited. > This new API(s) will run over ioctl. > It is good to know, but still, passing so much information to kernel when we can rather "compress" it, maybe it worth a second thought. >> By the way, doing that would only solve half of our problem. > > Well, actually, only a 3rd :| The new MR would likely be 0 based, but > the VM guest doesn't know about this. So you'd need an API that can do > arbitrary based to really solve your probably. I guess all HW should > be able to do this so maybe it is OK? The way we solve "the other" half is by intercepting the post-send requests in hypervisor. At hypervisor level we don't have contiguous virtual addresses anymore, but we don't need them for 0 based MRs: The guest still register regular MRs, while the hypervisor will register a 0 based MR save the guest virtual address of the MR. At post-send we simply substract the saved MR base address from the work request buffers and we are back to 0 based MR. > >> The other problem is what is happening on post-send. We don't have a >> virtually contiguous range to pass to post-send, and breaking the >> Work Request into several work requests using pages as boundaries >> will become again a problem if we want to send a big chunk (the HW >> has a rather limited max sg elements). We can solve it by using 0 >> based MRs, do you know if the current HW supports it? > > I think some does. > Do you have a model in mind? We would really want to try it out. By the way, I tried to search in the kernel for vendors implementing it and I saw maybe one vendor... so maybe 0 based MR is a nice idea but nothing more. Thanks, Marcel > Jason > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html