On 2/17/21 11:32 AM, Peter Xu wrote: > On Wed, Feb 10, 2021 at 04:03:19PM -0800, Mike Kravetz wrote: >> hugetlb fault processing code would COW all write faults where the >> pte was not writable. Soft dirty will write protect ptes as part >> of it's tracking mechanism. The existing hugetlb_cow code will do >> the right thing for PRIVATE mappings as it checks map_count. However, >> for SHARED mappings it would actually allocate and install a COW page. >> Modify the code to not call hugetlb_cow for SHARED mappings and just >> update the pte. >> >> Signed-off-by: Mike Kravetz <mike.kravetz@xxxxxxxxxx> >> --- >> mm/hugetlb.c | 23 ++++++++++++++++------- >> 1 file changed, 16 insertions(+), 7 deletions(-) >> >> diff --git a/mm/hugetlb.c b/mm/hugetlb.c >> index 47f3123afd1a..b561b6867ec1 100644 >> --- a/mm/hugetlb.c >> +++ b/mm/hugetlb.c >> @@ -4584,8 +4584,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> * spinlock. For private mappings, we also lookup the pagecache >> * page now as it is used to determine if a reservation has been >> * consumed. >> + * Only non-shared mappings are sent to hugetlb_cow. >> */ >> - if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { >> + if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry) && >> + !(vma->vm_flags & VM_SHARED)) { >> if (vma_needs_reservation(h, vma, haddr) < 0) { >> ret = VM_FAULT_OOM; >> goto out_mutex; >> @@ -4593,9 +4595,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> /* Just decrements count, does not deallocate */ >> vma_end_reservation(h, vma, haddr); >> >> - if (!(vma->vm_flags & VM_MAYSHARE)) >> - pagecache_page = hugetlbfs_pagecache_page(h, >> - vma, haddr); >> + pagecache_page = hugetlbfs_pagecache_page(h, vma, haddr); > > Pure question: I see that the check actually changed from VM_MAYSHARE into > VM_SHARE, then I noticed I'm actually unclear on the difference.. Say, when > VM_MAYSHARE is set, could VM_SHARED be cleared in any case? Or say, is this > change intended? The change was not intended. I will use VM_MAYSHARE. > > I see that vma_set_page_prot() tried to remove VM_SHARED if soft dirty enabled > (which should cause vma_wants_writenotify() to return true, iiuc), however > that's temporary just to calculate vm_page_prot, and it's not applied to the > vma->vm_flags. I failed to find a place where VM_SHARED of the vma is cleared > while VM_MAYSHARE is set.. I am not 100% sure about differences. Here is a snippet from do_mmap() where you can have VM_MAYSHARE and not VM_SHARED vm_flags |= VM_SHARED | VM_MAYSHARE; if (!(file->f_mode & FMODE_WRITE)) vm_flags &= ~(VM_MAYWRITE | VM_SHARED); fallthrough; > >> } >> >> ptl = huge_pte_lock(h, mm, ptep); >> @@ -4620,9 +4620,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, >> >> if (flags & FAULT_FLAG_WRITE) { >> if (!huge_pte_write(entry)) { >> - ret = hugetlb_cow(mm, vma, address, ptep, >> - pagecache_page, ptl); >> - goto out_put_page; >> + if (!(vma->vm_flags & VM_SHARED)) { >> + ret = hugetlb_cow(mm, vma, address, ptep, >> + pagecache_page, ptl); >> + goto out_put_page; >> + } >> + >> + /* write protected for soft dirty processing */ >> + if ((vma->vm_flags & VM_WRITE) && > > This VM_WRITE check seems to be redundant. As example, do_user_addr_fault() of > x86 code will check this right after vma lookup by access_error(). So when > reach here if "flags & FAULT_FLAG_WRITE", then VM_WRITE must be set, imho. Thanks, that sounds reasonable. I will check to make sure and drop the redundant check. > >> + (vma->vm_flags & VM_SHARED)) >> + entry = huge_pte_mkwrite(entry); > > Same question to VM_SHARED, since "(vma->vm_flags & VM_SHARED)" is just checked > above and we'll go hugetlb_cow() otherwise. Yes, certainly redundant here. > >> + >> + entry = huge_pte_mkdirty(entry); > > There's another huge_pte_mkdirty() right below; likely we could merge them somehow? > Yes, Thanks for taking a look! -- Mike Kravetz > Thanks, > >> } >> entry = huge_pte_mkdirty(entry); >> } >> -- >> 2.29.2 >> >