Re: [PATCH for-rc] Revert "RDMA/efa: Use API to get contiguous memory blocks aligned to device supported page size"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 21/01/2020 18:24, Leon Romanovsky wrote:
> On Tue, Jan 21, 2020 at 11:07:21AM +0200, Gal Pressman wrote:
>> On 20/01/2020 16:10, Gal Pressman wrote:
>>> The cited commit leads to register MR failures and random hangs when
>>> running different MPI applications. The exact root cause for the issue
>>> is still not clear, this revert brings us back to a stable state.
>>>
>>> This reverts commit 40ddb3f020834f9afb7aab31385994811f4db259.
>>>
>>> Fixes: 40ddb3f02083 ("RDMA/efa: Use API to get contiguous memory blocks aligned to device supported page size")
>>> Cc: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
>>> Cc: stable@xxxxxxxxxxxxxxx # 5.3
>>> Signed-off-by: Gal Pressman <galpress@xxxxxxxxxx>
>>
>> Shiraz, I think I found the root cause here.
>> I'm noticing a register MR of size 32k, which is constructed from two sges, the
>> first sge of size 12k and the second of 20k.
>>
>> ib_umem_find_best_pgsz returns page shift 13 in the following way:
>>
>> 0x103dcb2000      0x103dcb5000       0x103dd5d000           0x103dd62000
>>           +----------+                      +------------------+
>>           |          |                      |                  |
>>           |  12k     |                      |     20k          |
>>           +----------+                      +------------------+
>>
>>           +------+------+                 +------+------+------+
>>           |      |      |                 |      |      |      |
>>           | 8k   | 8k   |                 | 8k   | 8k   | 8k   |
>>           +------+------+                 +------+------+------+
>> 0x103dcb2000       0x103dcb6000   0x103dd5c000              0x103dd62000
>>
>>
>> The top row is the original umem sgl, and the bottom is the sgl constructed by
>> rdma_for_each_block with page size of 8k.
>>
>> Is this the expected output? The 8k pages cover addresses which aren't part of
>> the MR. This breaks some of the assumptions in the driver (for example, the way
>> we calculate the number of pages in the MR) and I'm not sure our device can
>> handle such sgl.
> 
> Artemy wrote this fix that can help you.
> 
> commit 60c9fe2d18b657df950a5f4d5a7955694bd08e63
> Author: Artemy Kovalyov <artemyko@xxxxxxxxxxxx>
> Date:   Sun Dec 15 12:43:13 2019 +0200
> 
>     RDMA/umem: Fix ib_umem_find_best_pgsz()
> 
>     Except for the last entry, the ending iova alignment sets the maximum
>     possible page size as the low bits of the iova must be zero when
>     starting the next chunk.
> 
>     Fixes: 4a35339958f1 ("RDMA/umem: Add API to find best driver supported page size in an MR")
>     Signed-off-by: Artemy Kovalyov <artemyko@xxxxxxxxxxxx>
>     Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index c3769a5f096d..06b6125b5ae1 100644
> --- a/drivers/infiniband/core/umem.c
> +++ b/drivers/infiniband/core/umem.c
> @@ -166,10 +166,13 @@ unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
>                  * for any address.
>                  */
>                 mask |= (sg_dma_address(sg) + pgoff) ^ va;
> -               if (i && i != (umem->nmap - 1))
> -                       /* restrict by length as well for interior SGEs */
> -                       mask |= sg_dma_len(sg);
>                 va += sg_dma_len(sg) - pgoff;
> +               /* Except for the last entry, the ending iova alignment sets
> +                * the maximum possible page size as the low bits of the iova
> +                * must be zero when starting the next chunk.
> +                */
> +               if (i != (umem->nmap - 1))
> +                       mask |= va;
>                 pgoff = 0;
>         }
>         best_pg_bit = rdma_find_pg_bit(mask, pgsz_bitmap);

Thanks Leon, I'll test this and let you know if it fixes the issue.
When are you planning to submit this?



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux