On Fri, Sep 30, 2022 at 04:19:30PM +0200, David Hildenbrand wrote: > FOLL_MIGRATION exists only for the purpose of break_ksm(), and > actually, there is not even the need to wait for the migration to > finish, we only want to know if we're dealing with a KSM page. > > Using follow_page() just to identify a KSM page overcomplicates GUP > code. Let's use walk_page_range_vma() instead, because we don't actually > care about the page itself, we only need to know a single property -- > no need to even grab a reference on the page. > > In my setup (AMD Ryzen 9 3900X), running the KSM selftest to test unmerge > performance on 2 GiB (taskset 0x8 ./ksm_tests -D -s 2048), this results in > a performance degradation of ~4% (old: ~5010 MiB/s, new: ~4800 MiB/s). > I don't think we particularly care for now. > > Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> [...] > +int break_ksm_pud_entry(pud_t *pud, unsigned long addr, unsigned long next, > + struct mm_walk *walk) > +{ > + /* We only care about page tables to walk to a single base page. */ > + if (pud_leaf(*pud) || !pud_present(*pud)) > + return 1; > + return 0; > +} Is this needed? I thought the pgtable walker handlers this already. [...] > static int break_ksm(struct vm_area_struct *vma, unsigned long addr) > { > - struct page *page; > vm_fault_t ret = 0; > > + if (WARN_ON_ONCE(!IS_ALIGNED(addr, PAGE_SIZE))) > + return -EINVAL; > + > do { > bool ksm_page = false; > > cond_resched(); > - page = follow_page(vma, addr, > - FOLL_GET | FOLL_MIGRATION | FOLL_REMOTE); > - if (IS_ERR_OR_NULL(page)) > - break; > - if (PageKsm(page)) > - ksm_page = true; > - put_page(page); > + ret = walk_page_range_vma(vma, addr, addr + PAGE_SIZE, > + &break_ksm_ops, &ksm_page); > + if (WARN_ON_ONCE(ret < 0)) > + return ret; I'm not sure this would be worth it, especially with a 4% degrade. The next patch will be able to bring 50- LOC, but this patch does 60+ anyway, based on another new helper just introduced... I just don't see whether there's strong enough reason to do so to drop FOLL_MIGRATE. It's different to the previous VM_FAULT_WRITE refactor because of the unshare approach was much of a good reasoning to me. Perhaps I missed something? -- Peter Xu