On Mon, Mar 8, 2021 at 8:16 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote: > > On Mon, Mar 08, 2021 at 06:13:52PM +0800, Zhu Yanjun wrote: > > > And I delved into the source code of __sg_alloc_table_from_pages. I > > found that this function is related with ib_dma_max_seg_size. So > > when ib_dma_max_seg_size is set to UINT_MAX, the sg dma address is > > 4K (one page). When ib_dma_max_seg_size is set to SZ_2M, the sg dma > > address is 2M now. > > That seems like a bug, you should fix it Hi, Jason && Leon I compared the function __sg_alloc_table_from_pages with ib_umem_add_sg_table. In __sg_alloc_table_from_pages: " 449 if (prv) { 450 unsigned long paddr = (page_to_pfn(sg_page(prv)) * PAGE_SIZE + 451 prv->offset + prv->length) / 452 PAGE_SIZE; 453 454 if (WARN_ON(offset)) 455 return ERR_PTR(-EINVAL); 456 457 /* Merge contiguous pages into the last SG */ 458 prv_len = prv->length; 459 while (n_pages && page_to_pfn(pages[0]) == paddr) { 460 if (prv->length + PAGE_SIZE > max_segment) 461 break; 462 prv->length += PAGE_SIZE; 463 paddr++; 464 pages++; 465 n_pages--; 466 } 467 if (!n_pages) 468 goto out; 469 } " if prv->length + PAGE_SIZE > max_segment, then set another sg. In the commit "RDMA/umem: Move to allocate SG table from pages", max_segment is dma_get_max_seg_size. Normally it is UINT_MAX. So in my host, prv->length + PAGE_SIZE is usually less than max_segment since length is unsigned int. So the following has very few chance to execute. " 485 for (i = 0; i < chunks; i++) { ... 490 for (j = cur_page + 1; j < n_pages; j++) { 491 seg_len += PAGE_SIZE; 492 if (seg_len >= max_segment || 493 page_to_pfn(pages[j]) != 494 page_to_pfn(pages[j - 1]) + 1) 495 break; 496 } 497 498 /* Pass how many chunks might be left */ 499 s = get_next_sg(sgt, s, chunks - i + left_pages, gfp_mask); ... 510 sg_set_page(s, pages[cur_page], 511 min_t(unsigned long, size, chunk_size), offset); 512 added_nents++; ... 516 } " In the function ib_umem_add_sg_table, max_segment is used like the below: " /* Squash N contiguous pages from page_list into current sge */ if (update_cur_sg) { if ((max_seg_sz - sg->length) >= (len << PAGE_SHIFT)) { sg_set_page(sg, sg_page(sg), sg->length + (len << PAGE_SHIFT), 0); update_cur_sg = false; continue; } update_cur_sg = false; } /* Squash N contiguous pages into next sge or first sge */ if (!first) sg = sg_next(sg); (*nents)++; sg_set_page(sg, first_page, len << PAGE_SHIFT, 0); " if (max_seg_sz - sg->length) >= (len << PAGE_SHIFT), the function sg_next and sg_set_page are called several times. The different usage of max_seg_sz/max_segment is the root cause that sg_dma_addresses are different from the function __sg_alloc_table_from_pages and ib_umem_add_sg_table. The following commit tries to fix this problem, please comment on it. >From 65472ef21146b0fb72d5eb3e2fe1277380d29446 Mon Sep 17 00:00:00 2001 From: Zhu Yanjun <zyjzyj2000@xxxxxxxxx> Date: Thu, 11 Mar 2021 12:35:40 -0500 Subject: [PATCH 1/1] RDMA/umem: fix different sg_dma_address problem After the commit 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages"), the returned sg list has the different dma addresses compared with the removed the function ib_umem_add_sg_table. The root cause is that max_seg_sz/max_segment has different usage in the function ib_umem_add_sg_table and __sg_alloc_table_from_pages. Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages") Signed-off-by: Zhu Yanjun <zyjzyj2000@xxxxxxxxx> --- drivers/infiniband/core/umem.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index 2dde99a9ba07..71188edbb45f 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -232,7 +232,9 @@ struct ib_umem *ib_umem_get(struct ib_device *device, unsigned long addr, npages -= ret; sg = __sg_alloc_table_from_pages(&umem->sg_head, page_list, ret, 0, ret << PAGE_SHIFT, - ib_dma_max_seg_size(device), sg, npages, + min_t(unsigned int, + ib_dma_max_seg_size(device), ret * PAGE_SIZE), + sg, npages, GFP_KERNEL); umem->sg_nents = umem->sg_head.nents; if (IS_ERR(sg)) { -- 2.27.0 > > Jason