This issue frequently hits if AMD IOMMU is enabled. In case of 1MB data transfer, ib core is supposed to set 256 entries of 4K page size in MR page table. Because of the defect in ib_sg_to_pages, it breaks just after setting one entry. Memory region page table entries may find stale entries (or NULL if address is memset). Something like this - crash> x/32a 0xffff9cd9f7f84000 <<< -This looks like stale entries. Only first entry is valid ->>> 0xffff9cd9f7f84000: 0xfffffffffff00000 0x68d31000 0xffff9cd9f7f84010: 0x68d32000 0x68d33000 0xffff9cd9f7f84020: 0x68d34000 0x975c5000 0xffff9cd9f7f84030: 0x975c6000 0x975c7000 0xffff9cd9f7f84040: 0x975c8000 0x975c9000 0xffff9cd9f7f84050: 0x975ca000 0x975cb000 0xffff9cd9f7f84060: 0x975cc000 0x975cd000 0xffff9cd9f7f84070: 0x975ce000 0x975cf000 0xffff9cd9f7f84080: 0x0 0x0 0xffff9cd9f7f84090: 0x0 0x0 0xffff9cd9f7f840a0: 0x0 0x0 0xffff9cd9f7f840b0: 0x0 0x0 0xffff9cd9f7f840c0: 0x0 0x0 0xffff9cd9f7f840d0: 0x0 0x0 0xffff9cd9f7f840e0: 0x0 0x0 0xffff9cd9f7f840f0: 0x0 0x0 All addresses other than 0xfffffffffff00000 are stale entries. Once this kind of incorrect page entries are passed to the RDMA h/w, AMD IOMMU module detects the page fault whenever h/w tries to access addresses which are not actually populated by the ib stack correctly. Below prints are logged whenever this issue hits. bnxt_en 0000:21:00.0: AMD-Vi: Event logged [IO_PAGE_FAULT domain=0x001e address=0x68d31000 flags=0x0050] ib_sg_to_pages function populates the correct page address in most of the cases, but there is one boundary condition which is not handled correctly. Boundary condition explained - Page addresses are not populated correctly if the dma buffer is mapped to the very last region of address space. One of the example - Whenever page_add is 0xfffffffffff00000 (Last 1MB section of the address space) and dma length is 1MB, end of the dma address = 0 (Derived from 0xfffffffffff00000 + 0x100000). use dma buffer length instead of end_dma_addr to fill page addresses. v0->v1 : Use first_page_off instead of page_off for readability Fix functional issue of not reseting first_page_off Fixes: 4c67e2bfc8b7 ("IB/core: Introduce new fast registration API") Signed-off-by: Kashyap Desai <kashyap.desai@xxxxxxxxxxxx> --- drivers/infiniband/core/verbs.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index e54b3f1b730e..5e72c44bac3a 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -2676,15 +2676,19 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, u64 dma_addr = sg_dma_address(sg) + sg_offset; u64 prev_addr = dma_addr; unsigned int dma_len = sg_dma_len(sg) - sg_offset; + unsigned int curr_dma_len = 0; + unsigned int first_page_off = 0; u64 end_dma_addr = dma_addr + dma_len; u64 page_addr = dma_addr & page_mask; + if (i == 0) + first_page_off = dma_addr - page_addr; /* * For the second and later elements, check whether either the * end of element i-1 or the start of element i is not aligned * on a page boundary. */ - if (i && (last_page_off != 0 || page_addr != dma_addr)) { + else if (last_page_off != 0 || page_addr != dma_addr) { /* Stop mapping if there is a gap. */ if (last_end_dma_addr != dma_addr) break; @@ -2708,8 +2712,10 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, } prev_addr = page_addr; next_page: + curr_dma_len += mr->page_size - first_page_off; page_addr += mr->page_size; - } while (page_addr < end_dma_addr); + first_page_off = 0; + } while (curr_dma_len < dma_len); mr->length += dma_len; last_end_dma_addr = end_dma_addr; -- 2.27.0
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature