Re: [PATCH v2 rdma-next 1/5] RDMA/umem: Add API to find best driver supported page size in an MR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 19, 2019 at 08:43:49AM -0500, Shiraz Saleem wrote:
> This helper iterates through the SG list to find the best page size to use
> from a bitmap of HW supported page sizes. Drivers that support multiple
> page sizes, but not mixed sizes in an MR can use this API.
> 
> Suggested-by: Jason Gunthorpe <jgg@xxxxxxxx>
> Cc: Gal Pressman <galpress@xxxxxxxxxx>
> Reviewed-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx>
> Signed-off-by: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
>  drivers/infiniband/core/umem.c | 57 ++++++++++++++++++++++++++++++++++++++++++
>  include/rdma/ib_umem.h         | 19 ++++++++++++++
>  include/rdma/ib_verbs.h        | 24 ++++++++++++++++++
>  3 files changed, 100 insertions(+)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 7e912a9..8624ba1 100644
> +++ b/drivers/infiniband/core/umem.c
> @@ -127,6 +127,63 @@ static struct scatterlist *ib_umem_add_sg_table(struct scatterlist *sg,
>  }
>  
>  /**
> + * ib_umem_find_best_pgsz - Find best HW page size to use for this MR
> + *
> + * @umem: umem struct
> + * @pgsz_bitmap: bitmap of HW supported page sizes
> + * @flags: see enum ib_umem_find_best_pgsz_flags
> + * @uvirt_addr: user-space virtual MR base address (provided if
> + * IB_UMEM_VA_BASED_OFFSET flag is set)
> + *
> + * This helper is intended for HW that support multiple page
> + * sizes but can do only a single page size in an MR.
> + *
> + * Returns 0 if the umem requires page sizes not supported by
> + * the driver to be mapped. Drivers always supporting PAGE_SIZE
> + * or smaller will never see a 0 result.
> + */
> +unsigned long ib_umem_find_best_pgsz(struct ib_umem *umem,
> +				     unsigned long pgsz_bitmap,
> +				     unsigned int flags,
> +				     unsigned long uvirt_addr)
> +{
> +	struct scatterlist *sg;
> +	dma_addr_t mask = 0;
> +	unsigned int best_pg_bit;
> +	int i;
> +
> +	if (WARN_ON(!(pgsz_bitmap & GENMASK(PAGE_SHIFT, 0))))
> +		return 0;
> +
> +	for_each_sg(umem->sg_head.sgl, sg, umem->nmap, i) {
> +		dma_addr_t dma_addr_start, dma_addr_end;
> +
> +		dma_addr_start = sg_dma_address(sg);
> +		dma_addr_end = sg_dma_address(sg) + sg_dma_len(sg);
> +		if (!i)
> +			/* first SGE: can start on partial page size.
> +			 * Ignore alignment of start addr.
> +			 * For HW that uses VA_BASED_OFFSET, minimal alignment
> +			 * restricted by end of the first page in virtual space.
> +			 */
> +			mask |= (flags & IB_UMEM_VA_BASED_OFFSET ?
> +					((uvirt_addr + sg_dma_len(sg)) | dma_addr_end) :
> +					dma_addr_end);

This doesn't look quite right to me..

Lets say we have 2M page with physical address 0x2600000 mapped into
virtual address 0x6400000

If we make a MR with VA 0x6400FFF and length 2M-4095 I expect the 2M
page size with the NIC.

We will have dma_addr_start = 0x2600000 and sg_dma_len = 0x200000 as
the SGL is always rounded to pages

So the above mask math will compute:

 0x2600000 + (0x6400FFF + 0x200000) = 0x8c00fff

I have a feeling the uvirt_addr should be computed more like

  if (flags & IB_UMEM_VA_BASED_OFFSET)
        mask |= uvirt_addr ^ umem->addr;

As umem->addr & hw_page_mask implicitly includes the offset into the
page, and if the uvirt_addr has a different offset bit then we can't
map 1:1 and thus have to reduce the page size.

For the above example if uvirt_addr = 0x6400FFF and umem->addr =
0x6400FFF because they are the same then the xor is 0.

If uvirt_addr = 0 (ie ZERO_BASED) and umem->addr is 0x6400FFF then the
function fails because this HW cannot create a zero based MR that is
not page aligned

If uvirt_addr = 0xFFF (ie offset'd ZERO_BASED) then (0xFFF ^
0x6400FFF) = 0x6400000 setting the max HW page size to 4M (as
expected)

.. and a more resonable case of say, uvirt_addr = 0 and umem->addr =
0x271000, then 0 ^ 0x271000 -> means the biggest page size is 4k,
which is correct.

Yes?

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux