>Subject: Re: [PATCH rdma-next v1 4/6] RDMA/i40iw: Use umem API to retrieve >aligned DMA address > >On Thu, Feb 28, 2019 at 09:42:30PM +0000, Saleem, Shiraz wrote: >> >Subject: Re: [PATCH rdma-next v1 4/6] RDMA/i40iw: Use umem API to >> >retrieve aligned DMA address >> > >> >On Sat, Feb 23, 2019 at 01:26:41PM -0600, Shiraz Saleem wrote: >> >> On Tue, Feb 19, 2019 at 09:07:29PM -0700, Jason Gunthorpe wrote: >> >> > On Tue, Feb 19, 2019 at 08:57:43AM -0600, Shiraz Saleem wrote: >> >> > > Call the core helpers to retrieve the HW aligned address to use >> >> > > for the MR, within a supported i40iw page size. >> >> > > >> >> > > Remove code in i40iw to determine when MR is backed by 2M huge >> >> > > pages which involves checking the umem->hugetlb flag and VMA >> >inspection. >> >> > > The core helpers will return the 2M aligned address if the MR >> >> > > is backed by 2M pages. >> >> > > >> >> > >> >> > > - for_each_sg_dma_page (region->sg_head.sgl, &sg_iter, region- >>nmap, >> >0) { >> >> > > - pg_addr = sg_page_iter_dma_address(&sg_iter); >> >> > > - if (first_pg) >> >> > > - *pbl = cpu_to_le64(pg_addr & iwmr->page_msk); >> >> > > - else if (!(pg_addr & ~iwmr->page_msk)) >> >> > > - *pbl = cpu_to_le64(pg_addr); >> >> > > - else >> >> > > - continue; >> >> > > - >> >> > > - first_pg = false; >> >> > > + for (ib_umem_start_phys_iter(region, &sg_phys_iter, >> >> > > - iwmr->page_size); >> >> > >> >> > Maybe this should be: >> >> > >> >> > for_each_sg_dma_page_sz (region->sg_head.sgl, &sg_iter, region- >>nmap, >> >> > iwmr->page_size) >> >> > >> >> > ? >> >> > >> >> > Is there a reason to move away from the API we built here? >> >> > >> >> Yes. Its a different iterator type we need to use here and iterate >> >> the SG list in contiguous aligned memory blocks as opposed to >> >> PAGE_SIZE increments. >> > >> >I mean, why not add an option to the core api to do something other >> >than PAGE_SIZE increments? >> > >> >> Not sure it's trivial and if it's that generic yet. The iterator >> ib_umem_next_phys_iter() is reliant on best supported HW size passed >> in to get the pg_bit to use when we iterate the SG list. >> >> And the best supported HW size is got via another umem API >> ib_umem_find_single_pg_size() specific to RDMA MRs which runs the SG list. > >Yes.. but once we have this block size computed with rdma specific code the >iteration seems fairly generic.. > >> We also ignore alignment of start of first sgl and end of last sgl >> which I thought was specific to RDMA HW? > >It is basically the same as what sg_page_iter_dma already does - that iterator >rounds everything down/up to a page boundary. > >If someone doesn't want that rounding then they need to use a different iterator.. > >> But maybe worth considering though for future work to extend as a scatterlist >API. > >Can we at least follow the same basic design of having a rdma_for_each_block () >iterator helper loop like for_each_sg_dma_page, etc? > Sure. I will explore it and see if we can generalize it. A look at for_each_sg_dma_page does seem that it's feasible. I ll need to work out the details and do some further testing. I am considering further breaking up this series. The page combining portion (i.e. patch #1) can be sent independently and the new API (with the revised design) to get aligned memory blocks can be a follow on. Also Selvin found a problem while testing the series. Some drivers still use umem->nmap to size their buffers to store the page dma address. With page combining, this can cause out of bound accesses when the SGEs are unfolded using for_each_sg_dma_page We need to size them based off umem->npages and this needs to be a precursor to the page combining patch. I ll send that out too. Shiraz