On Wed, Sep 29, 2021 at 12:50:15PM +0100, Joao Martins wrote: > On 9/28/21 19:01, Jason Gunthorpe wrote: > > On Thu, Sep 23, 2021 at 05:51:04PM +0100, Joao Martins wrote: > >> So ... if pgmap accounting was removed from gup-fast then this patch > >> would be a lot simpler and we could perhaps just fallback to the regular > >> hugepage case (THP, HugeTLB) like your suggestion at the top. See at the > >> end below scissors mark as the ballpark of changes. > >> > >> So far my options seem to be: 1) this patch which leverages the existing > >> iteration logic or 2) switching to for_each_compound_range() -- see my previous > >> reply 3) waiting for Dan to remove @pgmap accounting in gup-fast and use > >> something similar to below scissors mark. > >> > >> What do you think would be the best course of action? > > > > I still think the basic algorithm should be to accumulate physicaly > > contiguous addresses when walking the page table and then flush them > > back to struct pages once we can't accumulate any more. > > > > That works for both the walkers and all the page types? > > > > The logic already handles all page types -- I was trying to avoid the extra > complexity in regular hugetlb/THP path by not merging the handling of the > oddball case that is devmap (or fundamentally devmap > non-compound case in the future). FYI, this untested thing is what I came to when I tried to make something like this: /* * A large page entry such as PUD/PMD can point to a struct page. In cases like * THP this struct page will be a compound page of the same order as the page * table level. However, in cases like DAX or more generally pgmap ZONE_DEVICE, * the PUD/PMD may point at the first pfn in a string of pages. * * This helper iterates over all head pages or all the non-compound base pages. */ static pt_entry_iter_state { struct page *head; unsigned long compound_nr; unsigned long pfn; unsigned long end_pfn; }; static inline struct page *__pt_start_iter(struct iter_state *state, struct page *page, unsigned long pfn, unsigned int entry_size) { state->head = compound_head(page); state->compound_nr = compound_nr(page); state->pfn = pfn & (~(state->compound_nr - 1)); state->end_pfn = pfn + entry_size / PAGE_SIZE; return state->head; } static inline struct page *__pt_next_page(struct iter_state *state) { state->pfn += state->compound_nr; if (state->end_pfn <= state->pfn) return NULL; state->head = pfn_to_page(state->pfn); state->compound_nr = compound_nr(page); return state->head; } #define for_each_page_in_pt_entry(state, page, pfn, entry_size) \ for (page = __pt_start_iter(state, page, pfn, entry_size); page; \ page = __pt_next_page(&state)) static bool remove_pages_from_page_table(struct vm_area_struct *vma, struct page *page, unsigned long pfn, unsigned int entry_size, bool is_dirty, bool is_young) { struct iter_state state; for_each_page_in_pt_entry(&state, page, pfn, entry_size) remove_page_from_page_table(vma, page, is_dirty, is_young); } Jason