On Mon, Oct 22, 2018 at 03:06:42PM -0600, Jason Gunthorpe wrote: > On Fri, Oct 19, 2018 at 06:34:06PM -0500, Shiraz Saleem wrote: > > Squash contiguous regions of PAGE_SIZE pages > > into a single SG entry as opposed to one > > SG entry per page. This reduces the SG table > > size and is friendliest to the IOMMU. > > > > Suggested-by: Jason Gunthorpe <jgg@xxxxxxxx> > > Reviewed-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx> > > Signed-off-by: Shiraz Saleem <shiraz.saleem@xxxxxxxxx> > > drivers/infiniband/core/umem.c | 66 ++++++++++++++++++++---------------------- > > 1 file changed, 31 insertions(+), 35 deletions(-) [..] > > + page_list = kmalloc_array(npages, sizeof(*page_list), > > GFP_KERNEL); > > Hurm.. The reason it was __get_free_page and a loop was to avoid a > high order allocation triggered by userspace. > > ie if I want to make a MR over a 2G region this requires a 4M > allocation of page pointers which is too high to do via > kmalloc_array(). > > So I think this can't work, the page list has to be restricted in size > and we need to iterate... Yes. I had this same concern and this is one of the reasons I posted this patch set as an RFC. I was thinking of building a temporary linked-list of page_list arr objects (after pinning) each of which could hold a limited set of pages. We would then iterate this list to identify contiguous pages and squash them into an SGL entries. [..] > > + ret = sg_alloc_table_from_pages(&umem->sg_head, > > + page_list, > > + umem->npages, > > + 0, > > + umem->npages << PAGE_SHIFT, > > + GFP_KERNEL); > > Yep, this is nice.. But I wonder if this needs to copy what i915 is > doing here in i915_sg_segment_size() and > __i915_gem_userptr_alloc_pages() - ie restrict the size if the swiotlb > is enabled. Confusing as to why though.. > > > Anyhow, I think for this to be really optimal, some new apis in > scatterlist.h will be needed.. Something like > a'sg_append_table_from_pages' helper. > > Then the flow would be: > > sg_alloc_table(&table, min(npages, SG_MAX_SINGLE_ALLOC)) > while (npages) { > get_user_pages(page_list, ...) > sg_append_table_from_pages(&table, page_list, ..., > min(npages, SG_MAX_SINGLE_ALLOC)); > npages -= page_list_len; > } > > So we at most over allocate by 1 page, and if that became critical we > could copy the last chain in the sg_table. > > If sg_append_table_from_pages() needs more room in the table then it > would convert the last entry to a chain and allocate another table. > Since sg_alloc_table_from_pages cant be really used here as page_list size has to be restricted, I would prefer we defer solving the problem of over allocating of SG table to a follow on series. And do combining of contig regions into SGL entries in this one. > This bounds the excess memory required and doesn't need a high order > allocation to invoke get_user_pages. > > > Also, this series is out of order, right? > > If the umem core starts producing large SGL entries won't all the > drivers break? Hmmm..why? The code to break down each SGL entry into dma_addr + PAGE_SIZE*i (i = 0 to # of PAGE_SIZE pages in SGE) already exists no? For instance, https://elixir.bootlin.com/linux/v4.8/source/drivers/infiniband/hw/ocrdma/ocrdma_verbs.c#L918 > The drivers need to be able to break them down before this is > introduced. > > Jason