Re: [PATCH RFC 2/4] RDMA/umem: Add API to find best driver supported page size in an MR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 22, 2018 at 03:32:23PM -0600, Jason Gunthorpe wrote:
> On Fri, Oct 19, 2018 at 06:34:07PM -0500, Shiraz Saleem wrote:
> > This helper iterates through the SG list to find the best page size to use
> > from a bitmap of HW supported page sizes. Drivers that support multiple
> > page sizes, but not mixed pages in an MR can call this API.
> > 
> > Suggested-by: Jason Gunthorpe <jgg@xxxxxxxx>
> > Reviewed-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx>
> > Signed-off-by: Shiraz Saleem <shiraz.saleem@xxxxxxxxx>
> >  drivers/infiniband/core/umem.c | 95 ++++++++++++++++++++++++++++++++++++++++++
> >  include/rdma/ib_umem.h         |  7 ++++
> >  2 files changed, 102 insertions(+)
> > 
> > +static int ib_umem_find_pg_bit(unsigned long phyaddr,
> > +			       unsigned long supported_pgsz)
> 
> This returns an unsigned value
> 
> > +{
> > +	unsigned long num_zeroes;
> > +	int pg_bit;
> 
> unsigned

Right.

[..] 
> > +
> > +	for_each_sg(umem->sg_head.sgl, sg, umem->sg_head.orig_nents, i) {
> > +		dma_addr_start = sg_dma_address(sg);
> > +		dma_addr_end = sg_dma_address(sg) + sg_dma_len(sg);
> > +		pg_bit_start = ib_umem_find_pg_bit(dma_addr_start, supported_pgsz);
> > +		pg_bit_end = ib_umem_find_pg_bit(dma_addr_end, supported_pgsz);
> 
> This seems convoluted.. 
> 
> Shouldn't this just be another ffs on sg_dma_len() to compute the
> maximum possible page size?
> 
> 	pg_bit_start = ib_umem_find_pg_bit(dma_addr_start,
>    	   supported_pgsz & (BIT_ULL(__ffs(sg_dma_len(sg) + 1)) - 1));
> 
> Or so? then bit_end isn't needed and nor is the while loop below

Looks simpler. But, would it work correct for last SGE?
For a HW that supports 2M and 4K for example, if the last SGE starts at a 2M aligned addr
and has length equal 200 4K pages, the pg_bit as per above equation would compute 12
when it should be 21. Same applies to the first SGE I think.
 
> > +		if (!i) {
> > +			pg_bit_sg_chunk = max_t(int, pg_bit_start, pg_bit_end);
> > +			bitmap = supported_pgsz;
> > +			/* The start offset of the MR into a first _large_ page
> > +			 * should line up exactly for the user-space virtual buf
> > +			 * and physical buffer, in order to upgrade the page bit
> > +			 */
> > +			while (pg_bit_sg_chunk > PAGE_SHIFT) {
> 
> Hurm.. This doesn't look quite right.
> 
> This is a HW restriction where we must have RDMA_ADDR & PAGE_SIZE-1 ==
> 0, for all the interior pages, right? ie the HW is computing the
> offset into the DMA list naively?

Yes. For VA-based, least significant bits (12 bits for 4K page size, 21 bits for 2M page size etc.)
of the VA indicate offset.

> First of all, umem->address is the wrong thing to use, the RDMA
> virtual MR base address could be zero (ie IB_ZERO_BASED was used)
> there should be an accessor helper for this.

Yes. Zero based VA was not considered.

> Second this should only be checked once, not every loop. If we know
> the RDMA MR base address, the length of the first sgl, and the page
> size, we can do a single computation to reduce the page size to make
> it sufficiently aligned to the MR base.

I am not on this page yet (no pun intended :) ). But will spend some
time figuring how this can be done.

Shiraz



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux