On Wed, Oct 7, 2020 at 9:22 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > On Tue, Oct 06, 2020 at 12:41:22PM +0200, Daniel Vetter wrote: > > On Mon, Oct 05, 2020 at 08:56:50PM -0300, Jason Gunthorpe wrote: > > > On Sun, Oct 04, 2020 at 06:43:36PM +0300, Leon Romanovsky wrote: > > > > This series extends __sg_alloc_table_from_pages to allow chaining of > > > > new pages to already initialized SG table. > > > > > > > > This allows for the drivers to utilize the optimization of merging contiguous > > > > pages without a need to pre allocate all the pages and hold them in > > > > a very large temporary buffer prior to the call to SG table initialization. > > > > > > > > The second patch changes the Infiniband driver to use the new API. It > > > > removes duplicate functionality from the code and benefits the > > > > optimization of allocating dynamic SG table from pages. > > > > > > > > In huge pages system of 2MB page size, without this change, the SG table > > > > would contain x512 SG entries. > > > > E.g. for 100GB memory registration: > > > > > > > > Number of entries Size > > > > Before 26214400 600.0MB > > > > After 51200 1.2MB > > > > > > > > Thanks > > > > > > > > Maor Gottlieb (2): > > > > lib/scatterlist: Add support in dynamic allocation of SG table from > > > > pages > > > > RDMA/umem: Move to allocate SG table from pages > > > > > > > > Tvrtko Ursulin (2): > > > > tools/testing/scatterlist: Rejuvenate bit-rotten test > > > > tools/testing/scatterlist: Show errors in human readable form > > > > > > This looks OK, I'm going to send it into linux-next on the hmm tree > > > for awhile to see if anything gets broken. If there is more > > > remarks/tags/etc please continue > > > > An idea that just crossed my mind: A pin_user_pages_sgt might be useful > > for both rdma and drm, since this would avoid the possible huge interim > > struct pages array for thp pages. Or anything else that could be coalesced > > down into a single sg entry. > > > > Not sure it's worth it, but would at least give a slightly neater > > interface I think. > > We've talked about it. Christoph wants to see this area move to a biovec > interface instead of sgl, but it might still be worthwhile to have an > interm step at least as an API consolidation. Hm but then we'd need a new struct for the mapped side of things (which would still be what you get from dma-buf). That would be quite a bit of work to roll out everywhere, and sgt isn't such a huge misfit for passing buffer object mappings and system memory backing storage around, and hence what we (very slowly) converging drivers/gpu towards over the past 10 years or so. And moving the dma_map step out of dma-buf doesn't work, because some of the use-cases we have is for very special iommus which are managed by the gpu driver directly. Stuff that e.g. rotates/retiles/compresses on the fly, and is accessible by other (gfx related like video code, camera, ..) devices. Not something I expect to ever be relevant for rdma since this exist mostly on some small soc, but it's a thing. Without that dma-buf could hand out biovec for struct_page backed stuff, or some pfn_vec for the p2p stuff. Anyway was just an idea, I guess we'll have to live with some impedance mismatch since rolling out the one an only iovec structure which suits everyone is I think impossible :-) > Avoiding the page list would be complicated as we'd somehow have to > code share the page table iterator scheme. We're (slowly) getting towards thp for vram mappings and everything so I guess for drivers/gpu we might make that happen. But yeah it'd be not so pretty I think. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch