On 21/01/2020 18:39, Saleem, Shiraz wrote: >> Subject: Re: [PATCH for-rc] Revert "RDMA/efa: Use API to get contiguous >> memory blocks aligned to device supported page size" >> >> On 20/01/2020 16:10, Gal Pressman wrote: >>> The cited commit leads to register MR failures and random hangs when >>> running different MPI applications. The exact root cause for the issue >>> is still not clear, this revert brings us back to a stable state. >>> >>> This reverts commit 40ddb3f020834f9afb7aab31385994811f4db259. >>> >>> Fixes: 40ddb3f02083 ("RDMA/efa: Use API to get contiguous memory >>> blocks aligned to device supported page size") >>> Cc: Shiraz Saleem <shiraz.saleem@xxxxxxxxx> >>> Cc: stable@xxxxxxxxxxxxxxx # 5.3 >>> Signed-off-by: Gal Pressman <galpress@xxxxxxxxxx> >> >> Shiraz, I think I found the root cause here. >> I'm noticing a register MR of size 32k, which is constructed from two sges, the first >> sge of size 12k and the second of 20k. >> >> ib_umem_find_best_pgsz returns page shift 13 in the following way: >> >> 0x103dcb2000 0x103dcb5000 0x103dd5d000 0x103dd62000 >> +----------+ +------------------+ >> | | | | >> | 12k | | 20k | >> +----------+ +------------------+ >> >> +------+------+ +------+------+------+ >> | | | | | | | >> | 8k | 8k | | 8k | 8k | 8k | >> +------+------+ +------+------+------+ >> 0x103dcb2000 0x103dcb6000 0x103dd5c000 0x103dd62000 >> >> > > Gal - would be useful to know the IOVA (virt) and umem->addr also for this MR in ib_umem_find_best_pgsz I'll update my debug prints to include the iova and rerun the tests.