On Wed, Feb 10, 2021 at 04:03:19PM -0800, Mike Kravetz wrote: > hugetlb fault processing code would COW all write faults where the > pte was not writable. Soft dirty will write protect ptes as part > of it's tracking mechanism. The existing hugetlb_cow code will do > the right thing for PRIVATE mappings as it checks map_count. However, > for SHARED mappings it would actually allocate and install a COW page. > Modify the code to not call hugetlb_cow for SHARED mappings and just > update the pte. > > Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> > --- > mm/hugetlb.c | 23 ++++++++++++++++------- > 1 file changed, 16 insertions(+), 7 deletions(-) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 47f3123afd1a..b561b6867ec1 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4584,8 +4584,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > * spinlock. For private mappings, we also lookup the pagecache > * page now as it is used to determine if a reservation has been > * consumed. > + * Only non-shared mappings are sent to hugetlb_cow. > */ > - if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { > + if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry) && > + !(vma->vm_flags & VM_SHARED)) { > if (vma_needs_reservation(h, vma, haddr) < 0) { > ret = VM_FAULT_OOM; > goto out_mutex; > @@ -4593,9 +4595,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > /* Just decrements count, does not deallocate */ > vma_end_reservation(h, vma, haddr); > > - if (!(vma->vm_flags & VM_MAYSHARE)) > - pagecache_page = hugetlbfs_pagecache_page(h, > - vma, haddr); > + pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr); Pure question: I see that the check actually changed from VM_MAYSHARE into VM_SHARE, then I noticed I'm actually unclear on the difference.. Say, when VM_MAYSHARE is set, could VM_SHARED be cleared in any case? Or say, is this change intended? I see that vma_set_page_prot() tried to remove VM_SHARED if soft dirty enabled (which should cause vma_wants_writenotify() to return true, iiuc), however that's temporary just to calculate vm_page_prot, and it's not applied to the vma->vm_flags. I failed to find a place where VM_SHARED of the vma is cleared while VM_MAYSHARE is set.. > } > > ptl = huge_pte_lock(h, mm, ptep); > @@ -4620,9 +4620,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, > > if (flags & FAULT_FLAG_WRITE) { > if (!huge_pte_write(entry)) { > - ret = hugetlb_cow(mm, vma, address, ptep, > - pagecache_page, ptl); > - goto out_put_page; > + if (!(vma->vm_flags & VM_SHARED)) { > + ret = hugetlb_cow(mm, vma, address, ptep, > + pagecache_page, ptl); > + goto out_put_page; > + } > + > + /* write protected for soft dirty processing */ > + if ((vma->vm_flags & VM_WRITE) && This VM_WRITE check seems to be redundant. As example, do_user_addr_fault() of x86 code will check this right after vma lookup by access_error(). So when reach here if "flags & FAULT_FLAG_WRITE", then VM_WRITE must be set, imho. > + (vma->vm_flags & VM_SHARED)) > + entry = huge_pte_mkwrite(entry); Same question to VM_SHARED, since "(vma->vm_flags & VM_SHARED)" is just checked above and we'll go hugetlb_cow() otherwise. > + > + entry = huge_pte_mkdirty(entry); There's another huge_pte_mkdirty() right below; likely we could merge them somehow? Thanks, > } > entry = huge_pte_mkdirty(entry); > } > -- > 2.29.2 > -- Peter Xu