On Fri, Aug 19, 2022 at 03:13:47PM +0530, Kashyap Desai wrote: > > > All addresses other than 0xfffffffffff00000 are stale entries. > > > Once this kind of incorrect page entries are passed to the RDMA h/w, > > > AMD IOMMU module detects the page fault whenever h/w tries to access > > > addresses which are not actually populated by the ib stack correctly. > > > Below prints are logged whenever this issue hits. > > > > I don't understand this. AFAIK on AMD platforms you can't create an IOVA > > mapping at -1 like you are saying above, so how is > > 0xfffffffffff00000 a valid DMA address? > > Hi Jason - > > Let me simplify - Consider a case if 1 SGE has 8K dma_len and starting dma > address is <0xffffffffffffe000>. > It is expected to have two page table entry - <0xffffffffffffe000 > and > <0xfffffffffffff000 >. > Both the DMA address not mapped to -1. Device expose dma_mask_bits = 64, > so above two addresses are valid mapping from IOMMU perspective. That is not my point. My point is that 0xFFFFFFFFFFFFFFF should never be used as a DMA address because it invites overflow on any maths, and we are not careful about this in the kernel in general. > Since end_dma_addr will be zero (in current code) which is actually not > end_dma_addr but potential next_dma_addr, we will only endup set_page() call > one time. Which is the math overflow. > I think this is a valid mapping request and don't see any issue with IOMMU > mapping is incorrect. It should not create mappings that are so dangerous. There is really no reason to use the last page of IOVA space that includes -1. > > And if we have to tolerate these addreses then the code should be > > designed to avoid the overflow in the first place ie > > 'end_dma_addr' should be changed to 'last_dma_addr = dma_addr + > > (dma_len - 1)' which does not overflow, and all the logics > > carefully organized so none of the math overflows. > > Making 'last_dma_addr = dma_addr + (dma_len - 1)' will have another side > effect. Yes, the patch would have to fix everything about the logic to work with a last and avoid overflowing maths > How about just doing below ? This was my initial thought of fixing but I am > not sure which approach Is best. > > diff --git a/drivers/infiniband/core/verbs.c > b/drivers/infiniband/core/verbs.c > index e54b3f1b730e..56d1f3b20e98 100644 > --- a/drivers/infiniband/core/verbs.c > +++ b/drivers/infiniband/core/verbs.c > @@ -2709,7 +2709,7 @@ int ib_sg_to_pages(struct ib_mr *mr, struct > scatterlist *sgl, int sg_nents, > prev_addr = page_addr; > next_page: > page_addr += mr->page_size; > - } while (page_addr < end_dma_addr); > + } while (page_addr < (end_dma_addr - 1)); This is now overflowing twice :( Does this bug even still exist? eg does this revert "fix" it? https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=af3e9579ecfb It makes me wonder if the use of -1 is why drivers started failing with this mode. Jason