On 6/2/23 1:11 AM, Peter Xu wrote: > On Thu, Jun 01, 2023 at 01:16:14PM +0500, Muhammad Usama Anjum wrote: >> On 6/1/23 2:46 AM, Peter Xu wrote: >>> Muhammad, >>> >>> Sorry, I probably can only review the non-interface part, and leave the >>> interface/buffer handling, etc. review for others and real potential users >>> of it.. >> Thank you so much for the review. I think mostly we should be okay with >> interface as everybody has been making suggestions over the past revisions. >> >>> >>> On Thu, May 25, 2023 at 01:55:14PM +0500, Muhammad Usama Anjum wrote: >>>> +static inline void make_uffd_wp_huge_pte(struct vm_area_struct *vma, >>>> + unsigned long addr, pte_t *ptep, >>>> + pte_t ptent) >>>> +{ >>>> + pte_t old_pte; >>>> + >>>> + if (!huge_pte_none(ptent)) { >>>> + old_pte = huge_ptep_modify_prot_start(vma, addr, ptep); >>>> + ptent = huge_pte_mkuffd_wp(old_pte); >>>> + ptep_modify_prot_commit(vma, addr, ptep, old_pte, ptent); >>> >>> huge_ptep_modify_prot_start()? >> Sorry, I didn't realized that huge_ptep_modify_prot_start() is different >> from its pte version. > > Here I meant huge_ptep_modify_prot_commit().. I'll update. > >> >>> >>> The other thing is what if it's a pte marker already? What if a hugetlb >>> migration entry? Please check hugetlb_change_protection(). >> I've updated it in more better way. Please let me know what do you think >> about the following: >> >> static inline void make_uffd_wp_huge_pte(struct vm_area_struct *vma, >> unsigned long addr, pte_t *ptep, >> pte_t ptent) >> { >> if (is_hugetlb_entry_hwpoisoned(ptent) || is_pte_marker(ptent)) >> return; >> >> if (is_hugetlb_entry_migration(ptent)) >> set_huge_pte_at(vma->vm_mm, addr, ptep, >> pte_swp_mkuffd_wp(ptent)); >> else if (!huge_pte_none(ptent)) >> ptep_modify_prot_commit(vma, addr, ptep, ptent, >> huge_pte_mkuffd_wp(ptent)); >> else >> set_huge_pte_at(vma->vm_mm, addr, ptep, >> make_pte_marker(PTE_MARKER_UFFD_WP)); >> } > > the is_pte_marker() check can be extended to double check > pte_marker_uffd_wp() bit, but shouldn't matter a lot since besides the > uffd-wp bit currently we only support swapin error which should sigbus when > accessed, so no point in tracking anyway. Yeah, we are good with what we have as even if more bits are supported in pte markers, this function is only reached when UNPOPULATED + ASYNC WP are enabled. So no other bit would be set on the marker. > >> >> As we always set UNPOPULATED, so markers are always set on none ptes >> initially. Is it possible that a none pte becomes present, then swapped and >> finally none again? So I'll do the following addition for make_uffd_wp_pte(): >> >> --- a/fs/proc/task_mmu.c >> +++ b/fs/proc/task_mmu.c >> @@ -1800,6 +1800,9 @@ static inline void make_uffd_wp_pte(struct >> vm_area_struct *vma, >> } else if (is_swap_pte(ptent)) { >> ptent = pte_swp_mkuffd_wp(ptent); >> set_pte_at(vma->vm_mm, addr, pte, ptent); >> + } else { >> + set_pte_at(vma->vm_mm, addr, pte, >> + make_pte_marker(PTE_MARKER_UFFD_WP)); >> } >> } > > Makes sense, you can leverage userfaultfd_wp_use_markers() here, and you > should probably keep the protocol (only set the marker when WP_UNPOPULATED > for anon). This function is only reachable when UNPOPULATED + Async WP are set. So we don't need to use userfaultfd_wp_use_markers(). > >> >> >> >> >>> >>>> + } else { >>>> + set_huge_pte_at(vma->vm_mm, addr, ptep, >>>> + make_pte_marker(PTE_MARKER_UFFD_WP)); >>>> + } >>>> +} >>>> +#endif >>> >>> [...] >>> >>>> +static int pagemap_scan_pmd_entry(pmd_t *pmd, unsigned long start, >>>> + unsigned long end, struct mm_walk *walk) >>>> +{ >>>> + struct pagemap_scan_private *p = walk->private; >>>> + struct vm_area_struct *vma = walk->vma; >>>> + unsigned long addr = end; >>>> + pte_t *pte, *orig_pte; >>>> + spinlock_t *ptl; >>>> + bool is_written; >>>> + int ret = 0; >>>> + >>>> + arch_enter_lazy_mmu_mode(); >>>> + >>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>> + ptl = pmd_trans_huge_lock(pmd, vma); >>>> + if (ptl) { >>>> + unsigned long n_pages = (end - start)/PAGE_SIZE; >>>> + >>>> + if (p->max_pages && n_pages > p->max_pages - p->found_pages) >>>> + n_pages = p->max_pages - p->found_pages; >>>> + >>>> + is_written = !is_pmd_uffd_wp(*pmd); >>>> + >>>> + /* >>>> + * Break huge page into small pages if the WP operation need to >>>> + * be performed is on a portion of the huge page. >>>> + */ >>>> + if (is_written && IS_PM_SCAN_WP(p->flags) && >>>> + n_pages < HPAGE_SIZE/PAGE_SIZE) { >>>> + spin_unlock(ptl); >>>> + >>>> + split_huge_pmd(vma, pmd, start); >>>> + goto process_smaller_pages; >>>> + } >>>> + >>>> + if (IS_PM_SCAN_GET(p->flags)) >>>> + ret = pagemap_scan_output(is_written, vma->vm_file, >>>> + pmd_present(*pmd), >>>> + is_swap_pmd(*pmd), >>>> + p, start, n_pages); >>>> + >>>> + if (ret >= 0 && is_written && IS_PM_SCAN_WP(p->flags)) >>>> + make_uffd_wp_pmd(vma, addr, pmd); >>>> + >>>> + if (IS_PM_SCAN_WP(p->flags)) >>>> + flush_tlb_range(vma, start, end); >>>> + >>>> + spin_unlock(ptl); >>>> + >>>> + arch_leave_lazy_mmu_mode(); >>>> + return ret; >>>> + } >>>> + >>>> +process_smaller_pages: >>>> + if (pmd_trans_unstable(pmd)) { >>>> + arch_leave_lazy_mmu_mode(); >>>> + return 0; >>> >>> I'm not sure whether this is right.. Shouldn't you return with -EAGAIN and >>> let the user retry? Returning 0 means you'll move on with the next pmd >>> afaict and ignoring this one. >> This has come up before. We are just replicating pagemap_pmd_range() here >> as we are doing almost the same thing through IOCTL. It doesn't return any >> error in this case and just skips it. So we are doing the same. > > Hmm, is it a bug for pagemap? pagemapread.buffer should be linear to the > address range to be scanned to me. If it skips some unstable pmd without > filling in anything it seems everything later will be shifted with > PMD_SIZE.. I had a feeling that it should set walk->action==ACTION_AGAIN > before return. I don't think this is a bug if this is how it was implemented in the first place. In this task_mmu.c file, we can find several examples of the same pattern that error isn't returned if pmd_trans_unstable() succeeds. > -- BR, Muhammad Usama Anjum