RE: [PATCH v2 rdma-next 1/5] RDMA/umem: Add API to find best driver supported page size in an MR

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



>Subject: Re: [PATCH v2 rdma-next 1/5] RDMA/umem: Add API to find best driver
>supported page size in an MR
>
>On Tue, Apr 30, 2019 at 07:54:24PM +0000, Saleem, Shiraz wrote:
>> >Subject: Re: [PATCH v2 rdma-next 1/5] RDMA/umem: Add API to find best
>> >driver supported page size in an MR
>> >
>> >On Tue, Apr 30, 2019 at 01:36:14PM +0000, Saleem, Shiraz wrote:
>> >
>> >> >If we make a MR with VA 0x6400FFF and length 2M-4095 I expect the
>> >> >2M page size with the NIC.
>> >> >
>> >> >We will have dma_addr_start = 0x2600000 and sg_dma_len = 0x200000
>> >> >as the SGL is always rounded to pages
>> >>
>> >> why isn't it the sg_dma_len 2M-4095? Is it because we compute
>> >> npages in ib_umem_get() to build the SGL? Could using (addr &
>> >> PAGE_MASK) and then adding dma_len help take care of this?
>> >
>> >We always round to page sizes when building the SGL, so the start is
>> >rounded down and the end remains the same.
>> >
>> >> >I have a feeling the uvirt_addr should be computed more like
>> >> >
>> >> >  if (flags & IB_UMEM_VA_BASED_OFFSET)
>> >> >        mask |= uvirt_addr ^ umem->addr;
>> >>
>> >> I am not following.
>> >>
>> >> For a case like below where uvirt_addr = umem_addr, mask = 0 and we
>> >> run rdma_find_pgbit on it we ll pick 4K instead of 2M which is wrong.
>> >
>> >> uvirt_addr [0x7f2b98200000]
>> >> best_pgsz [0x201000]
>> >> umem->address [0x7f2b98200000]
>> >> dma_addr_start [0x130e200000]
>> >> dma_addr_end [0x130e400000]
>> >> sg_dma_len [2097152]
>> >
>> >??
>> >
>> >0x7f2b98200000 ^ 0x7f2b98200000 = 0
>> >
>> >So mask isn't altered by the requested VA and you get 2M pages.
>
>> I am still missing the point. mask was initialized to 0 and if we just
>> do "mask |= uvirt_addr ^ umem->addr" for the first SGE, then you ll
>> always get 0 for mask (and one page size) when uvirt_addr = umem->addr
>> irrespective of their values.
>
>This is because mask shouldn't start as zero - the highest possible mask is
>something like log2_ru(umem length)
>
>ie absent other considerations the page size at the NIC should be the size of the
>umem.
>
>Then we scan the sgl and reduce that value based on the physical address
>layout
>
>Then we reduce it again based on the uvirt vs address difference
>
>Oring a '0' into the mask means that step contributes no restriction.
>
>..
>
>So I think the algorithm is just off as is, it should be more like
>
> // Page size can't be larger than the length of the MR mask = log2_ru(umem
>length);
>
> // offset into the first SGL for umem->addr pgoff = umem->address &
>PAGE_MASK;  va = uvirt_addr;
>

Did you mean pgoff = umem->address & ~PAGE_MASK?

> for_each_sgl()
>   mask |= (sgl->dma_addr + pgoff) ^ va
>   if (not first or last)
>       // Interior SGLs limit by the length as well
>       mask |= sgl->length;
>   va += sgl->length - pgoff;
>   pgoff = 0;
>
>The key is the xor of the VA to the PA at every step because that is really the
>restriction we are looking at. If any VA/PA bits differ for any address in the MR
>that immediately reduces the max page size.
>
>Remember what the HW does is
>  PA = MR_PAGE[VA] | (VA & PAGE_MASK)
>
>So any situation where PA & PAGE_MASK != VA is a violation - this is the
>calculation the XOR is doing.
>
>Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux