On Wed, Sep 02, 2020 at 03:05:40PM +0300, Leon Romanovsky wrote: > On Wed, Sep 02, 2020 at 08:59:12AM -0300, Jason Gunthorpe wrote: > > On Wed, Sep 02, 2020 at 02:51:19PM +0300, Leon Romanovsky wrote: > > > On Tue, Sep 01, 2020 at 09:43:30PM -0300, Jason Gunthorpe wrote: > > > > rdma_for_each_block() makes assumptions about how the SGL is constructed > > > > that don't work if the block size is below the page size used to to build > > > > the SGL. > > > > > > > > The rules for umem SGL construction require that the SG's all be PAGE_SIZE > > > > aligned and we don't encode the actual byte offset of the VA range inside > > > > the SGL using offset and length. So rdma_for_each_block() has no idea > > > > where the actual starting/ending point is to compute the first/last block > > > > boundary if the starting address should be within a SGL. > > > > > > > > Fixing the SGL construction turns out to be really hard, and will be the > > > > subject of other patches. For now block smaller pages. > > > > > > > > Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR") > > > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxx> > > > > drivers/infiniband/core/umem.c | 6 ++++++ > > > > 1 file changed, 6 insertions(+) > > > > > > > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > > > > index 120e98403c345d..7b5bc969e55630 100644 > > > > +++ b/drivers/infiniband/core/umem.c > > > > @@ -151,6 +151,12 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, > > > > dma_addr_t mask; > > > > int i; > > > > > > > > + /* rdma_for_each_block() has a bug if the page size is smaller than the > > > > + * page size used to build the umem. For now prevent smaller page sizes > > > > + * from being returned. > > > > + */ > > > > + pgsz_bitmap &= GENMASK(BITS_PER_LONG - 1, PAGE_SHIFT); > > > > + > > > > > > Why do we care about such case? Why can't we leave this check forever? > > > > If HW supports only, say 4k page size, and runs on a 64k page size > > architecture it should be able to fragment into the native HW page > > size. > > > > The whole point of these APIs is to decouple the system and HW page > > sizes. > > Right now you are preventing such combinations, but is this real concern > for existing drivers? No, I didn't prevent anything, I've left those drivers just hardwired to use PAGE_SHIFT/PAGE_SIZE. Maybe they are broken and malfunction on 64k page size systems, maybe the HW supports other pages sizes and they should call ib_umem_find_best_pgsz(), I don't really know. The fix is fairly trivial, but it can't be done until the drivers stop touching umem->sgl - as it requires changing how the sgl is constructed to match standard kernel expectations, which also breaks all the drivers. Jason