Re: [PATCH v3 4/9] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 01, 2021 at 02:33:20PM -0800, Mike Kravetz wrote:
> On 1/28/21 2:48 PM, Axel Rasmussen wrote:
> > From: Peter Xu <peterx@xxxxxxxxxx>
> > 
> > Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because
> > userfaultfd-wp is always based on pgtable entries, so they cannot be shared.
> > 
> > Walk the hugetlb range and unshare all such mappings if there is, right before
> > UFFDIO_REGISTER will succeed and return to userspace.
> > 
> > This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing
> > is completely disabled for userfaultfd-wp registered range.
> > 
> > Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> > Signed-off-by: Axel Rasmussen <axelrasmussen@xxxxxxxxxx>
> > ---
> >  fs/userfaultfd.c             | 45 ++++++++++++++++++++++++++++++++++++
> >  include/linux/mmu_notifier.h |  1 +
> >  2 files changed, 46 insertions(+)
> > 
> > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> > index 894cc28142e7..2c6706ac2504 100644
> > --- a/fs/userfaultfd.c
> > +++ b/fs/userfaultfd.c
> > @@ -15,6 +15,7 @@
> >  #include <linux/sched/signal.h>
> >  #include <linux/sched/mm.h>
> >  #include <linux/mm.h>
> > +#include <linux/mmu_notifier.h>
> >  #include <linux/poll.h>
> >  #include <linux/slab.h>
> >  #include <linux/seq_file.h>
> > @@ -1190,6 +1191,47 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf,
> >  	}
> >  }
> >  
> > +/*
> > + * This function will unconditionally remove all the shared pmd pgtable entries
> > + * within the specific vma for a hugetlbfs memory range.
> > + */
> > +static void hugetlb_unshare_all_pmds(struct vm_area_struct *vma)
> > +{
> > +#ifdef CONFIG_HUGETLB_PAGE
> > +	struct hstate *h = hstate_vma(vma);
> > +	unsigned long sz = huge_page_size(h);
> > +	struct mm_struct *mm = vma->vm_mm;
> > +	struct mmu_notifier_range range;
> > +	unsigned long address;
> > +	spinlock_t *ptl;
> > +	pte_t *ptep;
> > +
> 
> Perhaps we should add a quick to see if vma is sharable.  Might be as
> simple as !(vma->vm_flags & VM_MAYSHARE).  I see a comment/question in
> a later patch about only doing minor fault processing on shared mappings.

Yes, that comment was majorly about shmem though - I believe shared case should
still be the major one, especially for hugetlbfs.

So what I was thinking is something like: one non-uffd process use shared
mapping of the file, meanwhile the other uffd process used private mapping on
the same file.  When the uffd process access page it could fault in the page
cache and continued by UFFDIO_CONTINUE, however when it writes it'll COW into
private pages.  Something like that.  Not sure whether it's useful, but I just
don't see why we should block that case.

> 
> Code below looks fine, but it would be a wast to do all that for a vma
> that could not be shared.

Right, still better to check it.

Mike, I agree with all your comments on the initial 4 patches, thanks for the
input!  To make Axel's life easier, I've modified them locally and pushed since
after all I'll do it in my series too (I also picked Mike's r-b on patch 3):

https://github.com/xzpeter/linux/commits/uffd-wp-shmem-hugetlbfs

Axel, feel free to fetch from it directly.

Thanks,

-- 
Peter Xu





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux