Hello Linus, On Sat, Jan 09, 2021 at 05:19:51PM -0800, Linus Torvalds wrote: > +#define is_cow_mapping(flags) (((flags) & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE) > + > +static inline bool pte_is_pinned(struct vm_area_struct *vma, unsigned long addr, pte_t pte) > +{ > + struct page *page; > + > + if (!is_cow_mapping(vma->vm_flags)) > + return false; > + if (likely(!atomic_read(&vma->vm_mm->has_pinned))) > + return false; > + page = vm_normal_page(vma, addr, pte); > + if (!page) > + return false; > + if (page_mapcount(page) != 1) > + return false; > + return page_maybe_dma_pinned(page); > +} I just don't see the simplification coming from 09854ba94c6aad7886996bfbee2530b3d8a7f4f4. Instead of checking page_mapcount above as an optimization, to me it looks much simpler to check it in a single place, in do_wp_page, that in addition of optimizing away the superfluous copy, would optimize away the above complexity as well. And I won't comment if it's actually safe to skip random pages or not. All I know is for mprotect and uffd-wp, definitely the above approach wouldn't work. In addition I dislike has_pinned and FOLL_PIN. has_pinned 450 include/linux/mm_types.h * for instance during page table copying for fork(). has_pinned 1021 kernel/fork.c atomic64_set(&mm->pinned_vm, 0); has_pinned 1239 mm/gup.c atomic_set(&mm->has_pinned, 1); has_pinned 2579 mm/gup.c atomic_set(¤t->mm->has_pinned, 1); has_pinned 1099 mm/huge_memory.c atomic_read(&src_mm->has_pinned) && has_pinned 1213 mm/huge_memory.c atomic_read(&src_mm->has_pinned) && has_pinned 819 mm/memory.c if (likely(!atomic_read(&src_mm->has_pinned))) Are we spending 32bit in mm_struct atomic_t just to call atomic_set(1) on it? Why isn't it a MMF_HAS_PINNED that already can be set atomically under mmap_read_lock too? There's bit left free there, we didn't run out yet to justify wasting another 31 bits. I hope I'm overlooking something. The existence of FOLL_LONGTERM is good and makes a difference at times for writeback if it's on a MAP_SHARED, or it makes difference during GUP to do a page_migrate before taking the pin, but for the whole rest of the VM it's irrelevant if it's long or short term, so I'm also concerned from what Jason mentioned about long term pins being treated differently within the VM. That to me looks fundamentally as flawed as introducing inaccuracies in do_wp_page, that even ignoring the performance implications caused by the inaccuracy, simply prevent to do some useful things. I obviously agree all common workloads with no GUP pins and no clear_refs and no uffd, are way more important to be optimal, but I haven't seen a single benchmark showing the benefit of not taking the PG_lock before a page copy or any other runtime benefit coming from page_count in do_wp_page. To the contrary now I see additional branches in fork and in various other paths. The only thing again where I see page_count provides a tangible benefit, is the vmsplice attack from child to parent, but that has not been fully fixed and if page_count is added to fix it in all COW faults, it'll introduce extra inefficiency to the the very common important workloads, not only to the special GUP/clear_refs/uffd-wp workloads as your patch above shows. Thanks, Andrea