On Thu, Oct 26, 2023 at 06:28:37AM -0700, Bart Van Assche wrote: > On 10/26/23 02:05, Zhijian Li (Fujitsu) wrote: > > The root cause is that > > > > rxe:rxe_set_page() gets wrong when mr.page_size != PAGE_SIZE where it only stores the *page to xarray. > > So the offset will get lost. > > > > For example, > > store process: > > page_size = 0x1000; > > PAGE_SIZE = 0x10000; > > va0 = 0xffff000020651000; > > page_offset = 0 = va & (page_size - 1); > > page = va_to_page(va); > > xa_store(&mr->page_list, mr->nbuf, page, GFP_KERNEL); > > > > load_process: > > page = xa_load(&mr->page_list, index); > > page_va = kmap_local_page(page) --> it must be a PAGE_SIZE align value, assume it as 0xffff000020650000 > > va1 = page_va + page_offset = 0xffff000020650000 + 0 = 0xffff000020650000; > > > > Obviously, *va0 != va1*, page_offset get lost. > > > > > > How to fix: > > - revert 325a7eb85199 ("RDMA/rxe: Cleanup page variables in rxe_mr.c") > > - don't allow ulp registering mr.page_size != PAGE_SIZE ? > > Thank you Zhijian for this root-cause analysis. > > Hardware RDMA adapters may not support the host page size (PAGE_SIZE) > so disallowing mr.page_size != PAGE_SIZE would be wrong. PAGE_SIZE must always be a valid way to construct a MR. We have code in all the DMA drivers to break PAGE_SIZE chunks to something smaller when building MRs. rxe is missing this now with the xarray stuff since we can't encode the struct page offset and a struct page in a single xarray entry. It would have to go back to encoding pfns and doing pfn to page. > If the rxe driver only supports mr.page_size == PAGE_SIZE, does this > mean that RXE_PAGE_SIZE_CAP should be changed from > 0xfffff000 into PAGE_SHIFT? Yes Jason