Re: [PATCH] hugetlb: unshare some PMDs when splitting VMAs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 4, 2023 at 8:03 PM Peter Xu <peterx@xxxxxxxxxx> wrote:
>
> On Wed, Jan 04, 2023 at 07:10:11PM +0000, James Houghton wrote:
> > > > I'll see if I can confirm that this is indeed possible and send a
> > > > repro if it is.
> > >
> > > I think your analysis above is correct.  The key being the failure to unshare
> > > in the non-PUD_SIZE vma after the split.
> >
> > I do indeed hit the WARN_ON_ONCE (repro attached), and the MADV wasn't
> > even needed (the UFFDIO_REGISTER does the VMA split before "unsharing
> > all PMDs"). With the fix, we avoid the WARN_ON_ONCE, but the behavior
> > is still incorrect: I expect the address range to be write-protected,
> > but it isn't.
> >
> > The reason why is that hugetlb_change_protection uses huge_pte_offset,
> > even if it's being called for a UFFDIO_WRITEPROTECT with
> > UFFDIO_WRITEPROTECT_MODE_WP. In that particular case, I'm pretty sure
> > we should be using huge_pte_alloc, but even so, it's not trivial to
> > get an allocation failure back up to userspace. The non-hugetlb
> > implementation of UFFDIO_WRITEPROTECT seems to also have this problem.
> >
> > Peter, what do you think?
>
> Indeed.  Thanks for spotting that, James.
>
> Non-hugetlb should be fine with having empty pgtable entries. Anon doesn't
> need to care about no-pgtable-populated ranges so far. Shmem does it with a
> few change_prepare() calls to populate the entries so the markers can be
> installed later on.

Ah ok! :)

>
> However I think the fault handling is still not well handled as you pointed
> out even for shmem: that's the path I probably never triggered myself yet
> before and the code stayed there since a very early version:
>
> #define  change_pmd_prepare(vma, pmd, cp_flags)                         \
>         do {                                                            \
>                 if (unlikely(uffd_wp_protect_file(vma, cp_flags))) {    \
>                         if (WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd)))   \
>                                 break;                                  \
>                 }                                                       \
>         } while (0)
>
> I think a better thing we can do here (instead of warning and stop the
> UFFDIO_WRITEPROTECT at the current stage) is returning with -ENOMEM
> properly so the user can know the error.  We'll need to touch the stacks up
> to uffd_wp_range() as it's the only one that can trigger the -ENOMEM so
> far, so as to not ignore retval from change_protection().
>
> Meanwhile, I'd also wonder whether we should call pagefault_out_of_memory()
> because it should be the same as when pgtable allocation failure happens in
> page faults, we may want to OOM already.  I can take care of hugetlb part
> too along the way.

I might be misunderstanding, the only case where
hugetlb_change_protection() would *need* to allocate is when it is
called from UFFDIO_WRITEPROTECT, not while handling a #pf. So I don't
think any calls to pagefault_out_of_memory() need to be added.

>
> Man page of UFFDIO_WRITEPROTECT may need a fixup too to introduce -ENOMEM.
>
> I can quickly prepare some patches for this, and hopefully it doesn't need
> to block the current fix on split.

I don't think it should block this splitting fix. I'll send another
version of this fix soon.

>
> Any thoughts?
>
> >
> > >
> > > To me, the fact it was somewhat difficult to come up with this scenario is an
> > > argument what we should just unshare at split time as you propose.  Who
> > > knows what other issues may exist.
> > >
> > > > 60dfaad65a ("mm/hugetlb: allow uffd wr-protect none ptes") is the
> > > > commit that introduced the WARN_ON_ONCE; perhaps it's a good choice
> > > > for a Fixes: tag (if above is indeed true).
> > >
> > > If the key issue in your above scenario is indeed the failure of
> > > hugetlb_unshare_all_pmds in the non-PUD_SIZE vma, then perhaps we tag?
> > >
> > > 6dfeaff93be1 ("hugetlb/userfaultfd: unshare all pmds for hugetlbfs when
> > > register wp")
> >
> > SGTM. Thanks Mike.
>
> Looks good here too.

Thanks, Peter!




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux