On 9/28/21 19:01, Jason Gunthorpe wrote: > On Thu, Sep 23, 2021 at 05:51:04PM +0100, Joao Martins wrote: >> So ... if pgmap accounting was removed from gup-fast then this patch >> would be a lot simpler and we could perhaps just fallback to the regular >> hugepage case (THP, HugeTLB) like your suggestion at the top. See at the >> end below scissors mark as the ballpark of changes. >> >> So far my options seem to be: 1) this patch which leverages the existing >> iteration logic or 2) switching to for_each_compound_range() -- see my previous >> reply 3) waiting for Dan to remove @pgmap accounting in gup-fast and use >> something similar to below scissors mark. >> >> What do you think would be the best course of action? > > I still think the basic algorithm should be to accumulate physicaly > contiguous addresses when walking the page table and then flush them > back to struct pages once we can't accumulate any more. > > That works for both the walkers and all the page types? > The logic already handles all page types -- I was trying to avoid the extra complexity in regular hugetlb/THP path by not merging the handling of the oddball case that is devmap (or fundamentally devmap non-compound case in the future). In the context of this patch I am think your suggestion that you wrote above to ... instead of changing __gup_device_huge() we uplevel/merge it all in gup_huge_{pud,pmd}() to cover the devmap? static int __gup_huge_range(orig_head, ...) { ... page = orig_head + ((addr & ~mask) >> PAGE_SHIFT); refs = record_subpages(page, addr, end, pages + *nr); head = try_grab_compound_head(orig_head, refs, flags); if (!head) return 0; if (unlikely(pud_val(orig) != pud_val(*pudp))) { put_compound_head(head, refs, flags); return 0; } SetPageReferenced(head); return 1; } static int gup_huge_pmd(...) { ... for_each_compound_range(index, page, npages, head, refs) { if (pud_devmap(orig)) pgmap = get_dev_pagemap(pmd_pfn(orig), pgmap); if (!__gup_huge_page_range(pmd_page(orig), refs)) { undo_dev_pagemap(...); return 0; } } put_dev_pagemap(pgmap); } But all this gup_huge_{pmd,pud}() rework is all just for the trouble of trying to merge the basepage-on-PMD/PUD case of devmap. It feels more complex (and affecting other page types) compared to leave the devmap odity siloed like option 1. If the pgmap refcount wasn't there and there was no users of basepages-on-PMD/PUD but just compound pages on PMDs/PUDs ... then we would be talking about code removal rather than added complexity. But I don't know how realistic that is for other devmap users (beside device-dax). > If the get_dev_pagemap has to remain then it just means we have to > flush before changing pagemap pointers Right -- I don't think we should need it as that discussion on the other thread goes. OTOH, using @pgmap might be useful to unblock gup-fast FOLL_LONGTERM for certain devmap types[0] (like MEMORY_DEVICE_GENERIC [device-dax] can support it but not MEMORY_DEVICE_FSDAX [fsdax]). [0] https://lore.kernel.org/linux-mm/6a18179e-65f7-367d-89a9-d5162f10fef0@xxxxxxxxxx/