Re: [PATCH rdma-next v1 4/6] RDMA/i40iw: Use umem API to retrieve aligned DMA address

Jason Gunthorpe <jgg@xxxxxxxx> · Mon, 4 Mar 2019 21:11:31 -0400

On Thu, Feb 28, 2019 at 09:42:30PM +0000, Saleem, Shiraz wrote:
> >Subject: Re: [PATCH rdma-next v1 4/6] RDMA/i40iw: Use umem API to retrieve
> >aligned DMA address
> >
> >On Sat, Feb 23, 2019 at 01:26:41PM -0600, Shiraz Saleem wrote:
> >> On Tue, Feb 19, 2019 at 09:07:29PM -0700, Jason Gunthorpe wrote:
> >> > On Tue, Feb 19, 2019 at 08:57:43AM -0600, Shiraz Saleem wrote:
> >> > > Call the core helpers to retrieve the HW aligned address to use
> >> > > for the MR, within a supported i40iw page size.
> >> > >
> >> > > Remove code in i40iw to determine when MR is backed by 2M huge
> >> > > pages which involves checking the umem->hugetlb flag and VMA
> >inspection.
> >> > > The core helpers will return the 2M aligned address if the MR is
> >> > > backed by 2M pages.
> >> > >
> >> >
> >> > > -	for_each_sg_dma_page (region->sg_head.sgl, &sg_iter, region->nmap,
> >0) {
> >> > > -		pg_addr = sg_page_iter_dma_address(&sg_iter);
> >> > > -		if (first_pg)
> >> > > -			*pbl = cpu_to_le64(pg_addr & iwmr->page_msk);
> >> > > -		else if (!(pg_addr & ~iwmr->page_msk))
> >> > > -			*pbl = cpu_to_le64(pg_addr);
> >> > > -		else
> >> > > -			continue;
> >> > > -
> >> > > -		first_pg = false;
> >> > > +	for (ib_umem_start_phys_iter(region, &sg_phys_iter,
> >> > > -		iwmr->page_size);
> >> >
> >> > Maybe this should be:
> >> >
> >> > for_each_sg_dma_page_sz (region->sg_head.sgl, &sg_iter, region->nmap,
> >> >                          iwmr->page_size)
> >> >
> >> > ?
> >> >
> >> > Is there a reason to move away from the API we built here?
> >> >
> >> Yes. Its a different iterator type we need to use here and iterate the
> >> SG list in contiguous aligned memory blocks as opposed to PAGE_SIZE
> >> increments.
> >
> >I mean, why not add an option to the core api to do something other than
> >PAGE_SIZE increments?
> >
> 
> Not sure it's trivial and if it's that generic yet. The iterator ib_umem_next_phys_iter()
> is reliant on best supported HW size passed in to get the pg_bit to use when
> we iterate the SG list. 
> 
> And the best supported HW size is got via another umem  API
> ib_umem_find_single_pg_size() specific to RDMA MRs which runs the SG list.

Yes.. but once we have this block size computed with rdma specific
code the iteration seems fairly generic..

> We also ignore alignment of start of first sgl and end of last sgl which I thought
> was specific to RDMA HW?

It is basically the same as what sg_page_iter_dma already does - that
iterator rounds everything down/up to a page boundary.

If someone doesn't want that rounding then they need to use a
different iterator..

> But maybe worth considering though for future work to extend as a scatterlist API. 

Can we at least follow the same basic design of having a
rdma_for_each_block () iterator helper loop like for_each_sg_dma_page,
etc?

Jason