On Wed, Jan 04, 2023 at 07:10:11PM +0000, James Houghton wrote: > > > I'll see if I can confirm that this is indeed possible and send a > > > repro if it is. > > > > I think your analysis above is correct. The key being the failure to unshare > > in the non-PUD_SIZE vma after the split. > > I do indeed hit the WARN_ON_ONCE (repro attached), and the MADV wasn't > even needed (the UFFDIO_REGISTER does the VMA split before "unsharing > all PMDs"). With the fix, we avoid the WARN_ON_ONCE, but the behavior > is still incorrect: I expect the address range to be write-protected, > but it isn't. > > The reason why is that hugetlb_change_protection uses huge_pte_offset, > even if it's being called for a UFFDIO_WRITEPROTECT with > UFFDIO_WRITEPROTECT_MODE_WP. In that particular case, I'm pretty sure > we should be using huge_pte_alloc, but even so, it's not trivial to > get an allocation failure back up to userspace. The non-hugetlb > implementation of UFFDIO_WRITEPROTECT seems to also have this problem. > > Peter, what do you think? Indeed. Thanks for spotting that, James. Non-hugetlb should be fine with having empty pgtable entries. Anon doesn't need to care about no-pgtable-populated ranges so far. Shmem does it with a few change_prepare() calls to populate the entries so the markers can be installed later on. However I think the fault handling is still not well handled as you pointed out even for shmem: that's the path I probably never triggered myself yet before and the code stayed there since a very early version: #define change_pmd_prepare(vma, pmd, cp_flags) \ do { \ if (unlikely(uffd_wp_protect_file(vma, cp_flags))) { \ if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd))) \ break; \ } \ } while (0) I think a better thing we can do here (instead of warning and stop the UFFDIO_WRITEPROTECT at the current stage) is returning with -ENOMEM properly so the user can know the error. We'll need to touch the stacks up to uffd_wp_range() as it's the only one that can trigger the -ENOMEM so far, so as to not ignore retval from change_protection(). Meanwhile, I'd also wonder whether we should call pagefault_out_of_memory() because it should be the same as when pgtable allocation failure happens in page faults, we may want to OOM already. I can take care of hugetlb part too along the way. Man page of UFFDIO_WRITEPROTECT may need a fixup too to introduce -ENOMEM. I can quickly prepare some patches for this, and hopefully it doesn't need to block the current fix on split. Any thoughts? > > > > > To me, the fact it was somewhat difficult to come up with this scenario is an > > argument what we should just unshare at split time as you propose. Who > > knows what other issues may exist. > > > > > 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes") is the > > > commit that introduced the WARN_ON_ONCE; perhaps it's a good choice > > > for a Fixes: tag (if above is indeed true). > > > > If the key issue in your above scenario is indeed the failure of > > hugetlb_unshare_all_pmds in the non-PUD_SIZE vma, then perhaps we tag? > > > > 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when > > register wp") > > SGTM. Thanks Mike. Looks good here too. Thanks, -- Peter Xu