On 1/28/2019 12:29 PM, Jason Gunthorpe wrote: > On Sat, Jan 26, 2019 at 11:09:45AM -0600, Steve Wise wrote: >> >>> From: linux-rdma-owner@xxxxxxxxxxxxxxx <linux-rdma- >>> owner@xxxxxxxxxxxxxxx> On Behalf Of Shiraz Saleem >>> Sent: Saturday, January 26, 2019 10:59 AM >>> To: dledford@xxxxxxxxxx; jgg@xxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx >>> Cc: Shiraz, Saleem <shiraz.saleem@xxxxxxxxx>; Steve Wise >>> <swise@xxxxxxxxxxx> >>> Subject: [PATCH RFC 05/12] RDMA/cxgb4: Use for_each_sg_dma_page >>> iterator on umem SGL >>> >>> From: "Shiraz, Saleem" <shiraz.saleem@xxxxxxxxx> >>> >>> Use the for_each_sg_dma_page iterator variant to walk the umem >>> DMA-mapped SGL and get the page DMA address. This avoids the extra >>> loop to iterate pages in the SGE when for_each_sg iterator is used. >>> >>> Additionally, purge umem->page_shift usage in the driver >>> as its only relevant for ODP MRs. Use system page size and >>> shift instead. >> Hey Shiraz, Doesn't umem->page_shift allow registering huge pages >> efficiently? IE is umem->page_shift set for the 2MB shift if the memory in >> this umem region is from the 2MB huge page pool? > I think long ago this might have been some feavered dream, but it was > never implemented and never made any sense. How would the core code > know it driver supported the CPU's huge page size? > > Shiraz's version to ineject huge pages into the driver is much better The driver advertises the "page sizes" it supports for MR PBLs (ib_device_attr.page_size_cap). For example, cxgb4 hw supports 4K up to 128MB. So if a umem was composed of only huge pages, then the reg code could pick a page size that is as big as the huge page size or up to the device max supported page size, thus reducing the PBL depth for a given MR. There was code for this once upon a time, I thought. Perhaps it was never upstreamed or it was rejected. Steve. > Jason