On 11/17/21 10:43, Christoph Hellwig wrote: > On Fri, Nov 12, 2021 at 04:08:24PM +0100, Joao Martins wrote: >> Use the newly added compound devmap facility which maps the assigned dax >> ranges as compound pages at a page size of @align. >> >> dax devices are created with a fixed @align (huge page size) which is >> enforced through as well at mmap() of the device. Faults, consequently >> happen too at the specified @align specified at the creation, and those >> don't change throughout dax device lifetime. MCEs unmap a whole dax >> huge page, as well as splits occurring at the configured page size. >> >> Performance measured by gup_test improves considerably for >> unpin_user_pages() and altmap with NVDIMMs: >> >> $ gup_test -f /dev/dax1.0 -m 16384 -r 10 -S -a -n 512 -w >> (pin_user_pages_fast 2M pages) put:~71 ms -> put:~22 ms >> [altmap] >> (pin_user_pages_fast 2M pages) get:~524ms put:~525 ms -> get: ~127ms put:~71ms >> >> $ gup_test -f /dev/dax1.0 -m 129022 -r 10 -S -a -n 512 -w >> (pin_user_pages_fast 2M pages) put:~513 ms -> put:~188 ms >> [altmap with -m 127004] >> (pin_user_pages_fast 2M pages) get:~4.1 secs put:~4.12 secs -> get:~1sec put:~563ms >> >> .. as well as unpin_user_page_range_dirty_lock() being just as effective >> as THP/hugetlb[0] pages. >> >> [0] https://lore.kernel.org/linux-mm/20210212130843.13865-5-joao.m.martins@xxxxxxxxxx/ >> >> Signed-off-by: Joao Martins <joao.m.martins@xxxxxxxxxx> >> Reviewed-by: Dan Williams <dan.j.williams@xxxxxxxxx> >> --- >> drivers/dax/device.c | 57 ++++++++++++++++++++++++++++++++++---------- >> 1 file changed, 44 insertions(+), 13 deletions(-) >> >> diff --git a/drivers/dax/device.c b/drivers/dax/device.c >> index a65c67ab5ee0..0c2ac97d397d 100644 >> --- a/drivers/dax/device.c >> +++ b/drivers/dax/device.c >> @@ -192,6 +192,42 @@ static vm_fault_t __dev_dax_pud_fault(struct dev_dax *dev_dax, >> } >> #endif /* !CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD */ >> >> +static void set_page_mapping(struct vm_fault *vmf, pfn_t pfn, >> + unsigned long fault_size, >> + struct address_space *f_mapping) >> +{ >> + unsigned long i; >> + pgoff_t pgoff; >> + >> + pgoff = linear_page_index(vmf->vma, ALIGN(vmf->address, fault_size)); >> + >> + for (i = 0; i < fault_size / PAGE_SIZE; i++) { >> + struct page *page; >> + >> + page = pfn_to_page(pfn_t_to_pfn(pfn) + i); >> + if (page->mapping) >> + continue; >> + page->mapping = f_mapping; >> + page->index = pgoff + i; >> + } >> +} > > No need to pass f_mapping here, it must be vmf->vma->vm_file->f_mapping. > Hmmm good point -- If I keep this structure yeah I will nuke @f_mapping. Should I move the @mapping setting to before vmf_insert_pfn*() (as Jason suggests) then the @f_mapping argument might be useful to clear it on @rc != VM_FAULT_NOPAGE. >> +static void set_compound_mapping(struct vm_fault *vmf, pfn_t pfn, >> + unsigned long fault_size, >> + struct address_space *f_mapping) >> +{ >> + struct page *head; >> + >> + head = pfn_to_page(pfn_t_to_pfn(pfn)); >> + head = compound_head(head); >> + if (head->mapping) >> + return; >> + >> + head->mapping = f_mapping; >> + head->index = linear_page_index(vmf->vma, >> + ALIGN(vmf->address, fault_size)); >> +} > > Same here. > /me nods >> if (rc == VM_FAULT_NOPAGE) { >> - unsigned long i; >> - pgoff_t pgoff; >> + struct dev_pagemap *pgmap = dev_dax->pgmap; > > If you're touching this anyway: why not do an early return here for > the error case to simplify the flow? > Yeah. I was thinking in doing that i.e. calling set_compound_mapping() earlier or even inside set_page_mapping(). Albeit this would need a new argument (the dev_dax).