On 14.04.22 19:15, Vlastimil Babka wrote: > On 3/29/22 18:04, David Hildenbrand wrote: >> Whenever GUP currently ends up taking a R/O pin on an anonymous page that >> might be shared -- mapped R/O and !PageAnonExclusive() -- any write fault >> on the page table entry will end up replacing the mapped anonymous page >> due to COW, resulting in the GUP pin no longer being consistent with the >> page actually mapped into the page table. >> >> The possible ways to deal with this situation are: >> (1) Ignore and pin -- what we do right now. >> (2) Fail to pin -- which would be rather surprising to callers and >> could break user space. >> (3) Trigger unsharing and pin the now exclusive page -- reliable R/O >> pins. >> >> We want to implement 3) because it provides the clearest semantics and >> allows for checking in unpin_user_pages() and friends for possible BUGs: >> when trying to unpin a page that's no longer exclusive, clearly >> something went very wrong and might result in memory corruptions that >> might be hard to debug. So we better have a nice way to spot such >> issues. >> >> To implement 3), we need a way for GUP to trigger unsharing: >> FAULT_FLAG_UNSHARE. FAULT_FLAG_UNSHARE is only applicable to R/O mapped >> anonymous pages and resembles COW logic during a write fault. However, in >> contrast to a write fault, GUP-triggered unsharing will, for example, still >> maintain the write protection. >> >> Let's implement FAULT_FLAG_UNSHARE by hooking into the existing write fault >> handlers for all applicable anonymous page types: ordinary pages, THP and >> hugetlb. >> >> * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that has been >> marked exclusive in the meantime by someone else, there is nothing to do. >> * If FAULT_FLAG_UNSHARE finds a R/O-mapped anonymous page that's not >> marked exclusive, it will try detecting if the process is the exclusive >> owner. If exclusive, it can be set exclusive similar to reuse logic >> during write faults via page_move_anon_rmap() and there is nothing >> else to do; otherwise, we either have to copy and map a fresh, >> anonymous exclusive page R/O (ordinary pages, hugetlb), or split the >> THP. >> >> This commit is heavily based on patches by Andrea. >> >> Co-developed-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> >> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> >> Signed-off-by: David Hildenbrand <david@xxxxxxxxxx> > > Acked-by: Vlastimil Babka <vbabka@xxxxxxx> > > Modulo a nit and suspected logical bug below. Thanks! >> @@ -4515,8 +4550,11 @@ static inline vm_fault_t create_huge_pmd(struct vm_fault *vmf) >> /* `inline' is required to avoid gcc 4.1.2 build error */ >> static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) >> { >> + const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; >> + >> if (vma_is_anonymous(vmf->vma)) { >> - if (userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) >> + if (unlikely(unshare) && > > Is this condition flipped, should it be "likely(!unshare)"? As the similar > code in do_wp_page() does. Good catch, this should affect uffd-wp on THP -- it wouldn't trigger as expected. Thanks a lot for finding that! > >> + userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) >> return handle_userfault(vmf, VM_UFFD_WP); >> return do_huge_pmd_wp_page(vmf); >> } >> @@ -4651,10 +4689,11 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) >> update_mmu_tlb(vmf->vma, vmf->address, vmf->pte); >> goto unlock; >> } >> - if (vmf->flags & FAULT_FLAG_WRITE) { >> + if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { >> if (!pte_write(entry)) >> return do_wp_page(vmf); >> - entry = pte_mkdirty(entry); >> + else if (likely(vmf->flags & FAULT_FLAG_WRITE)) >> + entry = pte_mkdirty(entry); >> } >> entry = pte_mkyoung(entry); >> if (ptep_set_access_flags(vmf->vma, vmf->address, vmf->pte, entry, > So the following on top, right? diff --git a/mm/memory.c b/mm/memory.c index 8b3cb73f5e44..4584c7e87a70 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3137,7 +3137,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) free_swap_cache(old_page); put_page(old_page); } - return page_copied && !unshare ? VM_FAULT_WRITE : 0; + return (page_copied && !unshare) ? VM_FAULT_WRITE : 0; oom_free_new: put_page(new_page); oom: @@ -4604,7 +4604,7 @@ static inline vm_fault_t wp_huge_pmd(struct vm_fault *vmf) const bool unshare = vmf->flags & FAULT_FLAG_UNSHARE; if (vma_is_anonymous(vmf->vma)) { - if (unlikely(unshare) && + if (likely(!unshare) && userfaultfd_huge_pmd_wp(vmf->vma, vmf->orig_pmd)) return handle_userfault(vmf, VM_UFFD_WP); return do_huge_pmd_wp_page(vmf); -- Thanks, David / dhildenb