> > All addresses other than 0xfffffffffff00000 are stale entries. > > Once this kind of incorrect page entries are passed to the RDMA h/w, > > AMD IOMMU module detects the page fault whenever h/w tries to access > > addresses which are not actually populated by the ib stack correctly. > > Below prints are logged whenever this issue hits. > > I don't understand this. AFAIK on AMD platforms you can't create an IOVA > mapping at -1 like you are saying above, so how is > 0xfffffffffff00000 a valid DMA address? Hi Jason - Let me simplify - Consider a case if 1 SGE has 8K dma_len and starting dma address is <0xffffffffffffe000>. It is expected to have two page table entry - <0xffffffffffffe000 > and <0xfffffffffffff000 >. Both the DMA address not mapped to -1. Device expose dma_mask_bits = 64, so above two addresses are valid mapping from IOMMU perspective. Since end_dma_addr will be zero (in current code) which is actually not end_dma_addr but potential next_dma_addr, we will only endup set_page() call one time. I think this is a valid mapping request and don't see any issue with IOMMU mapping is incorrect. > > Or, if the AMD IOMMU HW can actually do this, then I would say it is a bug > in > the IOMM DMA API to allow the aperture used for DMA mapping to get to the > end of ULONG_MAX, it is just asking for overflow bugs. > > And if we have to tolerate these addreses then the code should be designed > to > avoid the overflow in the first place ie 'end_dma_addr' > should be changed to 'last_dma_addr = dma_addr + (dma_len - 1)' which does > not overflow, and all the logics carefully organized so none of the math > overflows. Making 'last_dma_addr = dma_addr + (dma_len - 1)' will have another side effect. Driver may get call of set_page() more than max_pg_ptrs. I have not debug how it can happen, but just wanted to share result with you. I can check if that is a preferred path. 'end_dma_addr' is used in code for other arithmetic. How about just doing below ? This was my initial thought of fixing but I am not sure which approach Is best. diff --git a/drivers/infiniband/core/verbs.c b/drivers/infiniband/core/verbs.c index e54b3f1b730e..56d1f3b20e98 100644 --- a/drivers/infiniband/core/verbs.c +++ b/drivers/infiniband/core/verbs.c @@ -2709,7 +2709,7 @@ int ib_sg_to_pages(struct ib_mr *mr, struct scatterlist *sgl, int sg_nents, prev_addr = page_addr; next_page: page_addr += mr->page_size; - } while (page_addr < end_dma_addr); + } while (page_addr < (end_dma_addr - 1)); mr->length += dma_len; last_end_dma_addr = end_dma_addr; > > Jason
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature