On 21/01/2020 18:24, Leon Romanovsky wrote: > On Tue, Jan 21, 2020 at 11:07:21AM +0200, Gal Pressman wrote: >> On 20/01/2020 16:10, Gal Pressman wrote: >>> The cited commit leads to register MR failures and random hangs when >>> running different MPI applications. The exact root cause for the issue >>> is still not clear, this revert brings us back to a stable state. >>> >>> This reverts commit 40ddb3f020834f9afb7aab31385994811f4db259. >>> >>> Fixes: 40ddb3f02083 ("RDMA/efa: Use API to get contiguous memory blocks aligned to device supported page size") >>> Cc: Shiraz Saleem <shiraz.saleem@xxxxxxxxx> >>> Cc: stable@xxxxxxxxxxxxxxx # 5.3 >>> Signed-off-by: Gal Pressman <galpress@xxxxxxxxxx> >> >> Shiraz, I think I found the root cause here. >> I'm noticing a register MR of size 32k, which is constructed from two sges, the >> first sge of size 12k and the second of 20k. >> >> ib_umem_find_best_pgsz returns page shift 13 in the following way: >> >> 0x103dcb2000 0x103dcb5000 0x103dd5d000 0x103dd62000 >> +----------+ +------------------+ >> | | | | >> | 12k | | 20k | >> +----------+ +------------------+ >> >> +------+------+ +------+------+------+ >> | | | | | | | >> | 8k | 8k | | 8k | 8k | 8k | >> +------+------+ +------+------+------+ >> 0x103dcb2000 0x103dcb6000 0x103dd5c000 0x103dd62000 >> >> >> The top row is the original umem sgl, and the bottom is the sgl constructed by >> rdma_for_each_block with page size of 8k. >> >> Is this the expected output? The 8k pages cover addresses which aren't part of >> the MR. This breaks some of the assumptions in the driver (for example, the way >> we calculate the number of pages in the MR) and I'm not sure our device can >> handle such sgl. > > Artemy wrote this fix that can help you. > > commit 60c9fe2d18b657df950a5f4d5a7955694bd08e63 > Author: Artemy Kovalyov <artemyko@xxxxxxxxxxxx> > Date: Sun Dec 15 12:43:13 2019 +0200 > > RDMA/umem: Fix ib_umem_find_best_pgsz() > > Except for the last entry, the ending iova alignment sets the maximum > possible page size as the low bits of the iova must be zero when > starting the next chunk. > > Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR") > Signed-off-by: Artemy Kovalyov <artemyko@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c > index c3769a5f096d..06b6125b5ae1 100644 > --- a/drivers/infiniband/core/umem.c > +++ b/drivers/infiniband/core/umem.c > @@ -166,10 +166,13 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem, > * for any address. > */ > mask |= (sg_dma_address(sg) + pgoff) ^ va; > - if (i && i != (umem->nmap - 1)) > - /* restrict by length as well for interior SGEs */ > - mask |= sg_dma_len(sg); > va += sg_dma_len(sg) - pgoff; > + /* Except for the last entry, the ending iova alignment sets > + * the maximum possible page size as the low bits of the iova > + * must be zero when starting the next chunk. > + */ > + if (i != (umem->nmap - 1)) > + mask |= va; > pgoff = 0; > } > best_pg_bit = rdma_find_pg_bit(mask, pgsz_bitmap); Thanks Leon, I'll test this and let you know if it fixes the issue. When are you planning to submit this?