[PATCH rdma-rc] RDMA/core: fix sg_to_page mapping for boundary condition

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This issue frequently hits if AMD IOMMU is enabled.

In case of 1MB data transfer, ib core is supposed to set 256 entries of
4K page size in MR page table. Because of the defect in ib_sg_to_pages,
it breaks just after setting one entry.
Memory region page table entries may find stale entries (or NULL if
address is memset). Something like this -

crash> x/32a 0xffff9cd9f7f84000
<<< -This looks like stale entries. Only first entry is valid ->>>
0xffff9cd9f7f84000:     0xfffffffffff00000      0x68d31000
0xffff9cd9f7f84010:     0x68d32000      0x68d33000
0xffff9cd9f7f84020:     0x68d34000      0x975c5000
0xffff9cd9f7f84030:     0x975c6000      0x975c7000
0xffff9cd9f7f84040:     0x975c8000      0x975c9000
0xffff9cd9f7f84050:     0x975ca000      0x975cb000
0xffff9cd9f7f84060:     0x975cc000      0x975cd000
0xffff9cd9f7f84070:     0x975ce000      0x975cf000
0xffff9cd9f7f84080:     0x0     0x0
0xffff9cd9f7f84090:     0x0     0x0
0xffff9cd9f7f840a0:     0x0     0x0
0xffff9cd9f7f840b0:     0x0     0x0
0xffff9cd9f7f840c0:     0x0     0x0
0xffff9cd9f7f840d0:     0x0     0x0
0xffff9cd9f7f840e0:     0x0     0x0
0xffff9cd9f7f840f0:     0x0     0x0

All addresses other than 0xfffffffffff00000 are stale entries.
Once this kind of incorrect page entries are passed to the RDMA h/w,
AMD IOMMU module detects the page fault whenever h/w tries to access
addresses which are not actually populated by the ib stack correctly.
Below prints are logged whenever this issue hits.

bnxt_en 0000:21:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001e address=0x68d31000 flags=0x0050]

ib_sg_to_pages function populates the correct page address in most of the cases,
but there is one boundary condition which is not handled correctly.

Boundary condition explained -
Page addresses are not populated correctly if the dma buffer is mapped to the
very last region of address space.

One of the example -
Whenever page_add is  0xfffffffffff00000  (Last 1MB section of the address space)
and dma length is 1MB, end of the dma address = 0
(Derived from 0xfffffffffff00000 + 0x100000).

use dma buffer length instead of end_dma_addr to fill page addresses.

Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API")
Signed-off-by: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx>
---
 drivers/infiniband/core/verbs.c | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c
index e54b3f1b730e..36137735cd04 100644
--- a/drivers/infiniband/core/verbs.c
+++ b/drivers/infiniband/core/verbs.c
@@ -2676,15 +2676,19 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
 		u64 dma_addr = sg_dma_address(sg) + sg_offset;
 		u64 prev_addr = dma_addr;
 		unsigned int dma_len = sg_dma_len(sg) - sg_offset;
+		unsigned int curr_dma_len = 0;
+		unsigned int page_off = 0;
 		u64 end_dma_addr = dma_addr + dma_len;
 		u64 page_addr = dma_addr & page_mask;
 
+		if (i == 0)
+			page_off = dma_addr - page_addr;
 		/*
 		 * For the second and later elements, check whether either the
 		 * end of element i-1 or the start of element i is not aligned
 		 * on a page boundary.
 		 */
-		if (i && (last_page_off != 0 || page_addr != dma_addr)) {
+		else if (last_page_off != 0 || page_addr != dma_addr) {
 			/* Stop mapping if there is a gap. */
 			if (last_end_dma_addr != dma_addr)
 				break;
@@ -2708,8 +2712,9 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents,
 			}
 			prev_addr = page_addr;
 next_page:
+			curr_dma_len += mr->page_size - page_off;
 			page_addr += mr->page_size;
-		} while (page_addr < end_dma_addr);
+		} while (curr_dma_len < dma_len);
 
 		mr->length += dma_len;
 		last_end_dma_addr = end_dma_addr;
-- 
2.27.0

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux