Re: Fwd: [PATCH 1/1] RDMA/umem: add back hugepage sg list

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Mar 8, 2021 at 8:16 PM Jason Gunthorpe <jgg@xxxxxxxxxx> wrote:
>
> On Mon, Mar 08, 2021 at 06:13:52PM +0800, Zhu Yanjun wrote:
>
> > And I delved into the source code of __sg_alloc_table_from_pages. I
> > found that this function is related with ib_dma_max_seg_size. So
> > when ib_dma_max_seg_size is set to UINT_MAX, the sg dma address is
> > 4K (one page). When ib_dma_max_seg_size is set to SZ_2M, the sg dma
> > address is 2M now.
>
> That seems like a bug, you should fix it

Hi, Jason && Leon

I compared the function __sg_alloc_table_from_pages with ib_umem_add_sg_table.
In __sg_alloc_table_from_pages:

"
 449         if (prv) {
 450                 unsigned long paddr = (page_to_pfn(sg_page(prv))
* PAGE_SIZE +
 451                                        prv->offset + prv->length) /
 452                                       PAGE_SIZE;
 453
 454                 if (WARN_ON(offset))
 455                         return ERR_PTR(-EINVAL);
 456
 457                 /* Merge contiguous pages into the last SG */
 458                 prv_len = prv->length;
 459                 while (n_pages && page_to_pfn(pages[0]) == paddr) {
 460                         if (prv->length + PAGE_SIZE > max_segment)
 461                                 break;
 462                         prv->length += PAGE_SIZE;
 463                         paddr++;
 464                         pages++;
 465                         n_pages--;
 466                 }
 467                 if (!n_pages)
 468                         goto out;
 469         }

"
if prv->length + PAGE_SIZE > max_segment, then set another sg.
In the commit "RDMA/umem: Move to allocate SG table from pages",
max_segment is dma_get_max_seg_size.
Normally it is UINT_MAX. So in my host, prv->length + PAGE_SIZE is
usually less than max_segment
since length is unsigned int.

So the following has very few chance to execute.
"
 485         for (i = 0; i < chunks; i++) {
 ...
 490                 for (j = cur_page + 1; j < n_pages; j++) {
 491                         seg_len += PAGE_SIZE;
 492                         if (seg_len >= max_segment ||
 493                             page_to_pfn(pages[j]) !=
 494                             page_to_pfn(pages[j - 1]) + 1)
 495                                 break;
 496                 }
 497
 498                 /* Pass how many chunks might be left */
 499                 s = get_next_sg(sgt, s, chunks - i + left_pages, gfp_mask);
...
 510                 sg_set_page(s, pages[cur_page],
 511                             min_t(unsigned long, size,
chunk_size), offset);
 512                 added_nents++;
...
516         }
"

In the function ib_umem_add_sg_table, max_segment is used like the below:
"
               /* Squash N contiguous pages from page_list into current sge */
               if (update_cur_sg) {
                       if ((max_seg_sz - sg->length) >= (len << PAGE_SHIFT)) {
                               sg_set_page(sg, sg_page(sg),
                                           sg->length + (len << PAGE_SHIFT),
                                           0);
                               update_cur_sg = false;
                               continue;
                       }
                       update_cur_sg = false;
               }

               /* Squash N contiguous pages into next sge or first sge */
               if (!first)
                       sg = sg_next(sg);

               (*nents)++;
               sg_set_page(sg, first_page, len << PAGE_SHIFT, 0);
"
if (max_seg_sz - sg->length) >= (len << PAGE_SHIFT),
the function sg_next and sg_set_page are called several times.

The different usage of max_seg_sz/max_segment is the root cause that
sg_dma_addresses
are different from the function __sg_alloc_table_from_pages and
ib_umem_add_sg_table.

The following commit tries to fix this problem, please comment on it.

>From 65472ef21146b0fb72d5eb3e2fe1277380d29446 Mon Sep 17 00:00:00 2001
From: Zhu Yanjun <zyjzyj2000@xxxxxxxxx>
Date: Thu, 11 Mar 2021 12:35:40 -0500
Subject: [PATCH 1/1] RDMA/umem: fix different sg_dma_address problem

After the commit 0c16d9635e3a ("RDMA/umem: Move to allocate
SG table from pages"), the returned sg list has the different dma
addresses compared with the removed the function ib_umem_add_sg_table.

The root cause is that max_seg_sz/max_segment has different usage in
the function ib_umem_add_sg_table and __sg_alloc_table_from_pages.

Fixes: 0c16d9635e3a ("RDMA/umem: Move to allocate SG table from pages")
Signed-off-by: Zhu Yanjun <zyjzyj2000@xxxxxxxxx>
---
 drivers/infiniband/core/umem.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 2dde99a9ba07..71188edbb45f 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -232,7 +232,9 @@ struct ib_umem *ib_umem_get(struct ib_device
*device, unsigned long addr,
                npages -= ret;
                sg = __sg_alloc_table_from_pages(&umem->sg_head, page_list, ret,
                                0, ret << PAGE_SHIFT,
-                               ib_dma_max_seg_size(device), sg, npages,
+                               min_t(unsigned int,
+                               ib_dma_max_seg_size(device), ret * PAGE_SIZE),
+                               sg, npages,
                                GFP_KERNEL);
                umem->sg_nents = umem->sg_head.nents;
                if (IS_ERR(sg)) {
--
2.27.0

>
> Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux