On Thu, Feb 08, 2024 at 06:53:05PM +0000, Konstantin Taranov wrote: > > From: Long Li <longli@xxxxxxxxxxxxx> > > Sent: Thursday, 8 February 2024 19:43 > > To: Konstantin Taranov <kotaranov@xxxxxxxxxxxxxxxxxxx>; Konstantin > > Taranov <kotaranov@xxxxxxxxxxxxx>; sharmaajay@xxxxxxxxxxxxx; > > jgg@xxxxxxxx; leon@xxxxxxxxxx > > Cc: linux-rdma@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx > > Subject: RE: [PATCH rdma-next v1 1/1] RDMA/mana_ib: Fix bug in creation of > > dma regions > > > > > > > > /* Hardware requires dma region to align to chosen page size */ > > > - page_sz = ib_umem_find_best_pgsz(umem, PAGE_SZ_BM, 0); > > > + page_sz = ib_umem_find_best_pgsz(umem, PAGE_SZ_BM, virt); > > > if (!page_sz) { > > > ibdev_dbg(&dev->ib_dev, "failed to find page size.\n"); > > > return -ENOMEM; > > > } > > > > How about doing: > > page_sz = ib_umem_find_best_pgsz(umem, PAGE_SZ_BM, force_zero_offset > > ? 0 : virt); > > > > Will this work? This can get rid of the following while loop. > > > > I do not think so. I mentioned once, that it was failing for me with existing code > with the 4K-aligned addresses and 8K pages. In this case, we miscalculate the > number of pages. So, we think that it is one 8K page, but it is in fact two. That is a confusing statement.. What is "we" here? ib_umem_dma_offset() is not always guaranteed to be zero, with a 0 iova. With higher order pages the offset can be within the page, it generates offset = IOVA % pgsz There are a couple places that do want the offset to be fixed to zero and have the loop, at this point it would be good to consolidate them into some common ib_umem_find_best_pgsz_zero_offset() or something. > > > + > > > + if (force_zero_offset) { > > > + while (ib_umem_dma_offset(umem, page_sz) && page_sz > > > > PAGE_SIZE) > > > + page_sz /= 2; > > > + if (ib_umem_dma_offset(umem, page_sz) != 0) { > > > + ibdev_dbg(&dev->ib_dev, "failed to find page size to > > > force zero offset.\n"); > > > + return -ENOMEM; > > > + } > > > + } > > > + Yes this doesn't look quite right.. It should flow from the HW capability, the helper you call should be tightly linked to what the HW can do. ib_umem_find_best_pgsz() is used for MRs that have the usual offset = IOVA % pgsz We've always created other helpers for other restrictions. So you should move your "force_zero_offset" into another helper and describe exactly how the HW works to support the calculation It is odd to have the offset loop and be using ib_umem_find_best_pgsz() with some iova, usually you'd use ib_umem_find_best_pgoff() in those cases, see the other callers. Jason