RE: [PATCH rdma-next v1 4/6] RDMA/i40iw: Use umem API to retrieve aligned DMA address

"Saleem, Shiraz" <shiraz.saleem@xxxxxxxxx> · Tue, 12 Mar 2019 20:23:04 +0000

>Subject: Re: [PATCH rdma-next v1 4/6] RDMA/i40iw: Use umem API to retrieve
>aligned DMA address
>
>On Thu, Feb 28, 2019 at 09:42:30PM +0000, Saleem, Shiraz wrote:
>> >Subject: Re: [PATCH rdma-next v1 4/6] RDMA/i40iw: Use umem API to
>> >retrieve aligned DMA address
>> >
>> >On Sat, Feb 23, 2019 at 01:26:41PM -0600, Shiraz Saleem wrote:
>> >> On Tue, Feb 19, 2019 at 09:07:29PM -0700, Jason Gunthorpe wrote:
>> >> > On Tue, Feb 19, 2019 at 08:57:43AM -0600, Shiraz Saleem wrote:
>> >> > > Call the core helpers to retrieve the HW aligned address to use
>> >> > > for the MR, within a supported i40iw page size.
>> >> > >
>> >> > > Remove code in i40iw to determine when MR is backed by 2M huge
>> >> > > pages which involves checking the umem->hugetlb flag and VMA
>> >inspection.
>> >> > > The core helpers will return the 2M aligned address if the MR
>> >> > > is backed by 2M pages.
>> >> > >
>> >> >
>> >> > > -	for_each_sg_dma_page (region->sg_head.sgl, &sg_iter, region-
>>nmap,
>> >0) {
>> >> > > -		pg_addr = sg_page_iter_dma_address(&sg_iter);
>> >> > > -		if (first_pg)
>> >> > > -			*pbl = cpu_to_le64(pg_addr & iwmr->page_msk);
>> >> > > -		else if (!(pg_addr & ~iwmr->page_msk))
>> >> > > -			*pbl = cpu_to_le64(pg_addr);
>> >> > > -		else
>> >> > > -			continue;
>> >> > > -
>> >> > > -		first_pg = false;
>> >> > > +	for (ib_umem_start_phys_iter(region, &sg_phys_iter,
>> >> > > -		iwmr->page_size);
>> >> >
>> >> > Maybe this should be:
>> >> >
>> >> > for_each_sg_dma_page_sz (region->sg_head.sgl, &sg_iter, region-
>>nmap,
>> >> >                          iwmr->page_size)
>> >> >
>> >> > ?
>> >> >
>> >> > Is there a reason to move away from the API we built here?
>> >> >
>> >> Yes. Its a different iterator type we need to use here and iterate
>> >> the SG list in contiguous aligned memory blocks as opposed to
>> >> PAGE_SIZE increments.
>> >
>> >I mean, why not add an option to the core api to do something other
>> >than PAGE_SIZE increments?
>> >
>>
>> Not sure it's trivial and if it's that generic yet. The iterator
>> ib_umem_next_phys_iter() is reliant on best supported HW size passed
>> in to get the pg_bit to use when we iterate the SG list.
>>
>> And the best supported HW size is got via another umem  API
>> ib_umem_find_single_pg_size() specific to RDMA MRs which runs the SG list.
>
>Yes.. but once we have this block size computed with rdma specific code the
>iteration seems fairly generic..
>
>> We also ignore alignment of start of first sgl and end of last sgl
>> which I thought was specific to RDMA HW?
>
>It is basically the same as what sg_page_iter_dma already does - that iterator
>rounds everything down/up to a page boundary.
>
>If someone doesn't want that rounding then they need to use a different iterator..
>
>> But maybe worth considering though for future work to extend as a scatterlist
>API.
>
>Can we at least follow the same basic design of having a rdma_for_each_block ()
>iterator helper loop like for_each_sg_dma_page, etc?
>
Sure. I will explore it and see if we can generalize it. A look at for_each_sg_dma_page does
seem that it's feasible. I ll need to work out the details and do some further testing.

I am considering further breaking up this series. The page combining portion (i.e. patch #1)
can be sent independently and the new API (with the revised design) to get aligned memory
blocks can be a follow on.

Also Selvin found a problem while testing the series. Some drivers still use
umem->nmap to size their buffers to store the page dma address. With page combining, this can
cause out of bound accesses when the SGEs are unfolded using for_each_sg_dma_page
We need to size them based off umem->npages and this needs to be a precursor to the page
combining patch. I ll send that out too.

Shiraz