On 28-Mar-19 19:27, Jason Gunthorpe wrote: > On Thu, Mar 28, 2019 at 07:20:33PM +0200, Gal Pressman wrote: >> On 28-Mar-19 15:08, Jason Gunthorpe wrote: >>> On Thu, Mar 28, 2019 at 02:39:30PM +0200, Gal Pressman wrote: >>> >>>> +static int umem_to_page_list(struct efa_dev *dev, >>>> + struct ib_umem *umem, >>>> + u64 *page_list, >>>> + u32 hp_cnt, >>>> + u8 hp_shift) >>>> +{ >>>> + u32 pages_in_hp = BIT(hp_shift - PAGE_SHIFT); >>>> + unsigned int page_idx = 0; >>>> + unsigned int hp_idx = 0; >>>> + struct scatterlist *sg; >>>> + unsigned int entry; >>>> + >>>> + if (umem->page_shift != PAGE_SHIFT) { >>>> + ibdev_dbg(&dev->ibdev, >>>> + "umem invalid page shift %d\n", umem->page_shift); >>>> + return -EINVAL; >>>> + } >>>> + >>>> + ibdev_dbg(&dev->ibdev, "hp_cnt[%u], pages_in_hp[%u]\n", >>>> + hp_cnt, pages_in_hp); >>>> + >>>> + for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { >>>> + if (sg_dma_len(sg) != PAGE_SIZE) { >>>> + ibdev_dbg(&dev->ibdev, >>>> + "sg_dma_len[%u] != PAGE_SIZE[%lu]\n", >>>> + sg_dma_len(sg), PAGE_SIZE); >>>> + return -EINVAL; >>>> + } >>> >>> Again, this needs to work with Shiraz's work, driver's can't assume the >>> umem sgl has PAGE_SIZE chunks. Use for_each_sg_dma_page >> >> Thanks, I have a couple of questions: >> >> I'm not entirely up to date with Shiraz's work, wanna make sure I understand: >> Currently the umem sgl is built out of PAGE_SIZE elements. The reason we can't >> assume each element is PAGE_SIZE long is because the DMA mapping (iommu) might >> combine multiple elements or because Shiraz's work is going to combine the >> elements manually? > > That is the existing reason, but this patch: > > https://patchwork.kernel.org/patch/10875419/ > > Just directly makes larger SGL. Why do we want to combine pages if the iommu does that for us? Isn't the only thing we need here is Shiraz's iterator that uses the device supported page sizes? > >> Reading the documentation, sg_page_iter kinda looks like it covers >> sg_dma_page_iter functionality? Don't they both retrieve the DMA >> address page by page? Looks like the only difference is that in >> sg_page_iter I can get the struct page as well? > > No. > > A SGL contains two lists of different size, a 'dma mapped' list and a > 'CPU page list'. The act of DMA mapping creates the first list. > > These lists cannot be intermixed, if you run over the CPU page list > you can't get DMA addresses, and vice versa. The sg_page_iter documentation: /* * sg page iterator * * Iterates over sg entries page-by-page. On each successful iteration, you * can call sg_page_iter_page(@piter) to get the current page and its dma * address. @piter->sg will point to the sg holding this page and * @piter->sg_pgoffset to the page's page offset within the sg. The iteration * will stop either when a maximum number of sg entries was reached or a * terminating sg (sg_last(sg) == true) was reached. */ It says you can call sg_page_iter_page() to get the current page and its dma address, should the documentation be updated to remove this part? > > >>>> + unsigned long max_page_shift, >>>> + int *count, u8 *shift, u32 *ncont) >>>> +{ >>>> + unsigned long page_shift = umem->page_shift; >>>> + struct scatterlist *sg; >>>> + u64 base = ~0, p = 0; >>>> + unsigned long tmp; >>>> + unsigned long m; >>>> + u64 len, pfn; >>>> + int i = 0; >>>> + int entry; >>>> + >>>> + addr = addr >> page_shift; >>>> + tmp = (unsigned long)addr; >>>> + m = find_first_bit(&tmp, BITS_PER_LONG); >>>> + if (max_page_shift) >>>> + m = min_t(unsigned long, max_page_shift - page_shift, m); >>>> + >>>> + for_each_sg(umem->sg_head.sgl, sg, umem->nmap, entry) { >>>> + len = sg_dma_len(sg) >> page_shift; >>> >>> Shouldn't assume the offset is zero here. >> >> Because elements are not necessarily PAGE_SIZE? >> If page combining is already done by ib_core, this whole function can probably >> be deleted. > > If you test and ack the patch above then page combining will be done > by the ib_core tomorrow :) > >>> This whole function should really better match what Shiraz is >>> doing. It looks like it is just ib_umem_find_single_pg_size() ?? >> >> Yea, I think ib_umem_find_single_pg_size() could probably replace this whole >> function, this work isn't merged yet though right? > > Not yes, but if you test it and say it works and you need it, then it > can accelerate. Sure, I'll send my tested-by tag if it won't be merged by the time my code is ready.