On Mon, Apr 01, 2019 at 06:25:04PM +0000, Saleem, Shiraz wrote: > >Subject: Re: [PATCH v3 rdma-next] RDMA/umem: Combine contiguous PAGE_SIZE > >regions in SGEs > > > >On Thu, Mar 28, 2019 at 11:54:02AM -0500, Shiraz Saleem wrote: > >> +/* ib_umem_add_sg_table - Add N contiguous pages to scatter table > >> + * > >> + * sg: current scatterlist entry > >> + * page_list: array of npage struct page pointers > >> + * npages: number of pages in page_list > >> + * nents: [out] number of entries in the scatterlist > >> + * > >> + * Return new end of scatterlist > >> + */ > >> +static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg, > >> + struct page **page_list, > >> + unsigned long npages, > >> + int *nents) > >> +{ > >> + unsigned long first_pfn; > >> + unsigned long i = 0; > >> + bool update_cur_sg = false; > >> + bool first = !sg_page(sg); > >> + > >> + /* Check if new page_list is contiguous with end of previous page_list. > >> + * sg->length here is a multiple of PAGE_SIZE and sg->offset is 0. > >> + */ > >> + if (!first && (page_to_pfn(sg_page(sg)) + (sg->length >> PAGE_SHIFT) == > >> + page_to_pfn(page_list[0]))) > >> + update_cur_sg = true; > >> + > >> + while (i != npages) { > >> + unsigned long len; > >> + struct page *first_page = page_list[i]; > >> + > >> + first_pfn = page_to_pfn(first_page); > >> + > >> + /* Compute the number of contiguous pages we have starting > >> + * at i > >> + */ > >> + for (len = 0; i != npages && > >> + first_pfn + len == page_to_pfn(page_list[i]); > >> + len++) > >> + i++; > >> + > >> + /* Squash N contiguous pages from page_list into current sge */ > >> + if (update_cur_sg) { > >> + sg_set_page(sg, sg_page(sg), > >> + sg->length + (len << PAGE_SHIFT), 0); > > > >There is another problem here.. The sg->length is an 'unsigned int' so this add can > >technically overflow. Need to also check 'if length + len > UINT_MAX' before > >combining. > Thanks Jason! Nice catch. As a first step, we can check against SCATTERLIST_MAX_SEGMENT > to avoid the overflow case. > > > > >There is also dma_get_max_seg_size(), maybe that is the right limit? > >Maybe the IB core should just call dma_set_max_seg_size(2G) for all drivers? > > > >I'm looking at the iommu code and thinking if you pass a sg->length > > >max_seg_size it probably does something bad .. > > Probably a DMA mapping error but not sure. > I am trying to gather some data internally about how Intel IOMMU would behave in this case. I was looking at the ARM iommu stuff and it forcibly segmented the SGL into dma_get_max_seg_size() chunks, and didn't seem to check bounds.. Thus I guess it accesses past the end of the sgl if the input sgl's are larger than dma_get_max_seg_size() as it breaks them up. Didn't check *super* carefully though > If we have to resort to using the dma_get_max_seg_size() as limit, > setting dma_set_max_seg_size to 2G sounds via IB core sounds > reasonable as long as there is no HW limitation for any of the > devices. Nobody is setting anything.. I'm not sure what the default is for a modern PCI device.. 64k? Higher? I assume our default is already higher or the storage stuff isn't working quite right! This whole mechanism seems designed to support devices that have a wired 16 bit length for their DMA lists, and only one kind of DMA list.. For something like IB I think we should set the default to something big and and if the drivers can't handle it then they have to fragement when they build their DMA lists. Jason