Re: [PATCH v3 rdma-next] RDMA/umem: Combine contiguous PAGE_SIZE regions in SGEs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 01, 2019 at 06:25:04PM +0000, Saleem, Shiraz wrote:
> >Subject: Re: [PATCH v3 rdma-next] RDMA/umem: Combine contiguous PAGE_SIZE
> >regions in SGEs
> >
> >On Thu, Mar 28, 2019 at 11:54:02AM -0500, Shiraz Saleem wrote:
> >> +/* ib_umem_add_sg_table - Add N contiguous pages to scatter table
> >> + *
> >> + * sg: current scatterlist entry
> >> + * page_list: array of npage struct page pointers
> >> + * npages: number of pages in page_list
> >> + * nents: [out] number of entries in the scatterlist
> >> + *
> >> + * Return new end of scatterlist
> >> + */
> >> +static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg,
> >> +						struct page **page_list,
> >> +						unsigned long npages,
> >> +						int *nents)
> >> +{
> >> +	unsigned long first_pfn;
> >> +	unsigned long i = 0;
> >> +	bool update_cur_sg = false;
> >> +	bool first = !sg_page(sg);
> >> +
> >> +	/* Check if new page_list is contiguous with end of previous page_list.
> >> +	 * sg->length here is a multiple of PAGE_SIZE and sg->offset is 0.
> >> +	 */
> >> +	if (!first && (page_to_pfn(sg_page(sg)) + (sg->length >> PAGE_SHIFT) ==
> >> +		       page_to_pfn(page_list[0])))
> >> +		update_cur_sg = true;
> >> +
> >> +	while (i != npages) {
> >> +		unsigned long len;
> >> +		struct page *first_page = page_list[i];
> >> +
> >> +		first_pfn = page_to_pfn(first_page);
> >> +
> >> +		/* Compute the number of contiguous pages we have starting
> >> +		 * at i
> >> +		 */
> >> +		for (len = 0; i != npages &&
> >> +			      first_pfn + len == page_to_pfn(page_list[i]);
> >> +		     len++)
> >> +			i++;
> >> +
> >> +		/* Squash N contiguous pages from page_list into current sge */
> >> +		if (update_cur_sg) {
> >> +			sg_set_page(sg, sg_page(sg),
> >> +				    sg->length + (len << PAGE_SHIFT), 0);
> >
> >There is another problem here.. The sg->length is an 'unsigned int' so this add can
> >technically overflow. Need to also check 'if length + len > UINT_MAX' before
> >combining.
> Thanks Jason! Nice catch. As a first step, we can check against SCATTERLIST_MAX_SEGMENT
> to avoid the overflow case.
> 
> >
> >There is also dma_get_max_seg_size(), maybe that is the right limit?
> >Maybe the IB core should just call dma_set_max_seg_size(2G) for all drivers?
> >
> >I'm looking at the iommu code and thinking if you pass a sg->length >
> >max_seg_size it probably does something bad ..
> 
> Probably a DMA mapping error but not sure.
> I am trying to gather some data internally about how Intel IOMMU would behave in this case.

I was looking at the ARM iommu stuff and it forcibly segmented the SGL
into dma_get_max_seg_size() chunks, and didn't seem to check
bounds.. Thus I guess it accesses past the end of the sgl if the input
sgl's are larger than dma_get_max_seg_size() as it breaks them up.

Didn't check *super* carefully though

> If we have to resort to using the dma_get_max_seg_size() as limit,
> setting dma_set_max_seg_size to 2G sounds via IB core sounds
> reasonable as long as there is no HW limitation for any of the
> devices.

Nobody is setting anything.. I'm not sure what the default is for a
modern PCI device.. 64k? Higher? I assume our default is already
higher or the storage stuff isn't working quite right!

This whole mechanism seems designed to support devices that have a
wired 16 bit length for their DMA lists, and only one kind of DMA
list..

For something like IB I think we should set the default to something
big and and if the drivers can't handle it then they have to fragement
when they build their DMA lists.

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux