On Tue, Apr 23, 2019 at 12:13 AM Saleem, Shiraz <shiraz.saleem@xxxxxxxxx> wrote: > > >Subject: Re: [PATCH v2 rdma-next 0/5] Introduce a DMA block iterator > > > >On Fri, Apr 19, 2019 at 08:43:48AM -0500, Shiraz Saleem wrote: > >> From: "Shiraz Saleem" <shiraz.saleem@xxxxxxxxx> > >> > >> This patch set is aiming to allow drivers to leverage a new DMA block > >> iterator to get contiguous aligned memory blocks within their HW > >> supported page sizes. The motivation for this work comes from the > >> discussion in [1]. > >> > >> The first patch introduces a new umem API that allows drivers to find > >> a best supported page size to use for the MR, from a bitmap of HW > >> supported page sizes. > >> > >> The second patch introduces a new DMA block iterator that returns > >> allows drivers to get aligned DMA addresses within a HW supported page size. > >> > >> The third patch and fouth patch removes the dependency of i40iw and > >> bnxt_re drivers on the hugetlb flag. The new core APIs are called in > >> these drivers to get huge page size aligned addresses if the MR is backed by > >huge pages. > >> > >> The sixth patch removes the hugetlb flag from IB core. > >> > >> Please note that mixed page portion of the algorithm and bnxt_re > >> update in patch #4 have not been tested on hardware. > >> > >> [1] https://patchwork.kernel.org/patch/10499753/ > >> > >> RFC-->v0: > >> * Add to scatter table by iterating a limited sized page list. > >> * Updated driver call sites to use the for_each_sg_page iterator > >> variant where applicable. > >> * Tweaked algorithm in ib_umem_find_single_pg_size and > >ib_umem_next_phys_iter > >> to ignore alignment of the start of first SGE and end of the last SGE. > >> * Simplified ib_umem_find_single_pg_size on offset alignments checks for > >> user-space virtual and physical buffer. > >> * Updated ib_umem_start_phys_iter to do some pre-computation > >> for the non-mixed page support case. > >> * Updated bnxt_re driver to use the new core APIs and remove its > >> dependency on the huge tlb flag. > >> * Fixed a bug in computation of sg_phys_iter->phyaddr in > >ib_umem_next_phys_iter. > >> * Drop hugetlb flag usage from RDMA subsystem. > >> * Rebased on top of for-next. > >> > >> v0-->v1: > >> * Remove the patches that update driver to use for_each_sg_page variant > >> to iterate in the SGE. This is sent as a seperate series using > >> the for_each_sg_dma_page variant. > >> * Tweak ib_umem_add_sg_table API defintion based on maintainer feedback. > >> * Cache number of scatterlist entries in umem. > >> * Update function headers for ib_umem_find_single_pg_size and > >ib_umem_next_phys_iter. > >> * Add sanity check on supported_pgsz in ib_umem_find_single_pg_size. > >> > >> v1-->v2: > >> *Removed page combining patch as it was sent stand alone. > >> *__fls on pgsz_bitmap as opposed to fls64 since it's an unsigned long. > >> *rename ib_umem_find_pg_bit() --> rdma_find_pg_bit() and moved to > >> ib_verbs.h *rename ib_umem_find_single_pg_size() --> > >> ib_umem_find_best_pgsz() *New flag IB_UMEM_VA_BASED_OFFSET for > >ib_umem_find_best_pgsz API for HW that uses least significant bits > >> of VA to indicate start offset into DMA list. > >> *rdma_find_pg_bit() logic is re-written and simplified. It can support input of 0 or 1 > >dma addr cases. > >> *ib_umem_find_best_pgsz() optimized to be less computationally expensive > >running rdma_find_pg_bit() only once. > >> *rdma_for_each_block() is the new re-designed DMA block iterator which is more > >in line with for_each_sg_dma_page()iterator. > >> *rdma_find_mixed_pg_bit() logic for interior SGE's accounting for start and end > >dma address. > >> *remove i40iw specific enums for supported page size *remove vma_list > >> form ib_umem_get() > > > >Gal? Does this work for you now? > > > >At this point this only impacts two drivers that are presumably tested by their > >authors, so I'd like to merge it to finally get rid of the hugetlb flag.. But EFA will > >need to use it too > > > > Selvin - It would be good if you could retest this version of the series with bnxt_re too since > there were some design changes to core algorithms. > > Shiraz > > Series tested with bnxt_re. Looks good with my testing. Tested-by: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx>