On Mon, Nov 15, 2021 at 01:11:32PM +0100, Joao Martins wrote: > On 11/12/21 16:34, Jason Gunthorpe wrote: > > On Fri, Nov 12, 2021 at 04:08:24PM +0100, Joao Martins wrote: > > > >> diff --git a/drivers/dax/device.c b/drivers/dax/device.c > >> index a65c67ab5ee0..0c2ac97d397d 100644 > >> +++ b/drivers/dax/device.c > >> @@ -192,6 +192,42 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, > >> } > >> #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ > >> > >> +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn, > >> + unsigned long fault_size, > >> + struct address_space *f_mapping) > >> +{ > >> + unsigned long i; > >> + pgoff_t pgoff; > >> + > >> + pgoff = linear_page_index(vmf->vma, ALIGN(vmf->address, fault_size)); > >> + > >> + for (i = 0; i < fault_size / PAGE_SIZE; i++) { > >> + struct page *page; > >> + > >> + page = pfn_to_page(pfn_t_to_pfn(pfn) + i); > >> + if (page->mapping) > >> + continue; > >> + page->mapping = f_mapping; > >> + page->index = pgoff + i; > >> + } > >> +} > >> + > >> +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn, > >> + unsigned long fault_size, > >> + struct address_space *f_mapping) > >> +{ > >> + struct page *head; > >> + > >> + head = pfn_to_page(pfn_t_to_pfn(pfn)); > >> + head = compound_head(head); > >> + if (head->mapping) > >> + return; > >> + > >> + head->mapping = f_mapping; > >> + head->index = linear_page_index(vmf->vma, > >> + ALIGN(vmf->address, fault_size)); > >> +} > > > > Should this stuff be setup before doing vmf_insert_pfn_XX? > > > > Interestingly filesystem-dax does this, but not device-dax. I think it may be a bug ? > set_page_mapping/set_compound_mapping() could be moved to before and > then torn down on @rc != VM_FAULT_NOPAGE (failure). I am not sure > what's the benefit in this series.. besides the ordering (that you > hinted below) ? Well, it should probably be fixed in a precursor patch. I think the general idea is that page->mapping/index are stable once the page is published in a PTE? > > In normal cases the page should be returned in the vmf and populated > > to the page tables by the core code after all this is done. > > So I suppose by call sites examples as 'core code' is either hugetlbfs call to > __filemap_add_folio() (on hugetlbfs fault handler), shmem_add_to_page_cache() or > anon-equivalent. I was talking more about the normal page insertion flow which is done by setting vmf->page and then returning. finish_fault() will install the PTE If this is the best way then I would expect some future where maybe there is a vmf->folio and finish_fault() will install a PUD/PMD/PTE and we don't call vmf_insert_pfnxx in DAX. Jason