>Subject: Re: [PATCH v2 rdma-next 1/5] RDMA/umem: Add API to find best driver >supported page size in an MR > >On Tue, Apr 30, 2019 at 07:54:24PM +0000, Saleem, Shiraz wrote: >> >Subject: Re: [PATCH v2 rdma-next 1/5] RDMA/umem: Add API to find best >> >driver supported page size in an MR >> > >> >On Tue, Apr 30, 2019 at 01:36:14PM +0000, Saleem, Shiraz wrote: >> > >> >> >If we make a MR with VA 0x6400FFF and length 2M-4095 I expect the >> >> >2M page size with the NIC. >> >> > >> >> >We will have dma_addr_start = 0x2600000 and sg_dma_len = 0x200000 >> >> >as the SGL is always rounded to pages >> >> >> >> why isn't it the sg_dma_len 2M-4095? Is it because we compute >> >> npages in ib_umem_get() to build the SGL? Could using (addr & >> >> PAGE_MASK) and then adding dma_len help take care of this? >> > >> >We always round to page sizes when building the SGL, so the start is >> >rounded down and the end remains the same. >> > >> >> >I have a feeling the uvirt_addr should be computed more like >> >> > >> >> > if (flags & IB_UMEM_VA_BASED_OFFSET) >> >> > mask |= uvirt_addr ^ umem->addr; >> >> >> >> I am not following. >> >> >> >> For a case like below where uvirt_addr = umem_addr, mask = 0 and we >> >> run rdma_find_pgbit on it we ll pick 4K instead of 2M which is wrong. >> > >> >> uvirt_addr [0x7f2b98200000] >> >> best_pgsz [0x201000] >> >> umem->address [0x7f2b98200000] >> >> dma_addr_start [0x130e200000] >> >> dma_addr_end [0x130e400000] >> >> sg_dma_len [2097152] >> > >> >?? >> > >> >0x7f2b98200000 ^ 0x7f2b98200000 = 0 >> > >> >So mask isn't altered by the requested VA and you get 2M pages. > >> I am still missing the point. mask was initialized to 0 and if we just >> do "mask |= uvirt_addr ^ umem->addr" for the first SGE, then you ll >> always get 0 for mask (and one page size) when uvirt_addr = umem->addr >> irrespective of their values. > >This is because mask shouldn't start as zero - the highest possible mask is >something like log2_ru(umem length) > >ie absent other considerations the page size at the NIC should be the size of the >umem. > >Then we scan the sgl and reduce that value based on the physical address >layout > >Then we reduce it again based on the uvirt vs address difference > >Oring a '0' into the mask means that step contributes no restriction. > >.. > >So I think the algorithm is just off as is, it should be more like > > // Page size can't be larger than the length of the MR mask = log2_ru(umem >length); > > // offset into the first SGL for umem->addr pgoff = umem->address & >PAGE_MASK; va = uvirt_addr; > Did you mean pgoff = umem->address & ~PAGE_MASK? > for_each_sgl() > mask |= (sgl->dma_addr + pgoff) ^ va > if (not first or last) > // Interior SGLs limit by the length as well > mask |= sgl->length; > va += sgl->length - pgoff; > pgoff = 0; > >The key is the xor of the VA to the PA at every step because that is really the >restriction we are looking at. If any VA/PA bits differ for any address in the MR >that immediately reduces the max page size. > >Remember what the HW does is > PA = MR_PAGE[VA] | (VA & PAGE_MASK) > >So any situation where PA & PAGE_MASK != VA is a violation - this is the >calculation the XOR is doing. > >Jason