Re: [PATCH RFC 05/12] RDMA/cxgb4: Use for_each_sg_dma_page iterator on umem SGL

Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx> · Mon, 28 Jan 2019 12:45:26 -0600

On 1/28/2019 12:29 PM, Jason Gunthorpe wrote:
> On Sat, Jan 26, 2019 at 11:09:45AM -0600, Steve Wise wrote:
>>
>>> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma-
>>> owner@xxxxxxxxxxxxxxx> On Behalf Of Shiraz Saleem
>>> Sent: Saturday, January 26, 2019 10:59 AM
>>> To: dledford@xxxxxxxxxx; jgg@xxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx
>>> Cc: Shiraz, Saleem <shiraz.saleem@xxxxxxxxx>; Steve Wise
>>> <swise@xxxxxxxxxxx>
>>> Subject: [PATCH RFC 05/12] RDMA/cxgb4: Use for_each_sg_dma_page
>>> iterator on umem SGL
>>>
>>> From: "Shiraz, Saleem" <shiraz.saleem@xxxxxxxxx>
>>>
>>> Use the for_each_sg_dma_page iterator variant to walk the umem
>>> DMA-mapped SGL and get the page DMA address. This avoids the extra
>>> loop to iterate pages in the SGE when for_each_sg iterator is used.
>>>
>>> Additionally, purge umem->page_shift usage in the driver
>>> as its only relevant for ODP MRs. Use system page size and
>>> shift instead.
>> Hey Shiraz, Doesn't umem->page_shift allow registering huge pages
>> efficiently?  IE is umem->page_shift set for the 2MB shift if the memory in
>> this umem region is from the 2MB huge page pool? 
> I think long ago this might have been some feavered dream, but it was
> never implemented and never made any sense. How would the core code
> know it driver supported the CPU's huge page size?
>
> Shiraz's version to ineject huge pages into the driver is much better

The driver advertises the "page sizes" it supports for MR PBLs
(ib_device_attr.page_size_cap).  For example, cxgb4 hw supports 4K up to
128MB.  So if a umem was composed of only huge pages, then the reg code
could pick a page size that is as big as the huge page size or up to the
device max supported page size, thus reducing the PBL depth for a given
MR.  There was code for this once upon a time, I thought.  Perhaps it
was never upstreamed or it was rejected.

Steve.

> Jason