Re: [PATCH v1 2/5] userfaultfd: introduce access-likely mode for common operations

Peter Xu <peterx@xxxxxxxxxx> · Tue, 28 Jun 2022 16:56:15 -0400

On Tue, Jun 28, 2022 at 08:30:46PM +0000, Nadav Amit wrote:
> 
> 
> > On Jun 28, 2022, at 12:15 PM, Peter Xu <peterx@xxxxxxxxxx> wrote:
> > 
> > ⚠ External Email
> > 
> > On Mon, Jun 27, 2022 at 11:37:32PM +0000, Nadav Amit wrote:
> >> 
> >> Besides the benefit of avoiding a TLB flush, there is also the benefit
> >> of having more precise dirty tracking. You assume UFFDIO_CONTINUE will be
> >> preceded by memory write to the shared memory, but that does not have to
> >> be the case. Similarly, if in the future userfaultfd would also support
> >> memory-backed private mappings, that does not have to be the case either.
> > 
> > Could I ask what's the not supported private mapping you're talking about
> > (and I think you meant "file-backed")? I thought all kinds of private
> > mappings are supported on all modes already?
> 
> Yes, I meant file-backed. See vma_can_userfault() for the supported
> types of memory:
> 
>   static inline bool vma_can_userfault(struct vm_area_struct *vma,
>                                      unsigned long vm_flags)
>   {       
>         if (vm_flags & VM_UFFD_MINOR)
>                 return is_vm_hugetlb_page(vma) || vma_is_shmem(vma);
>                 
> #ifndef CONFIG_PTE_MARKER_UFFD_WP
>         /*
>          * If user requested uffd-wp but not enabled pte markers for
>          * uffd-wp, then shmem & hugetlbfs are not supported but only
>          * anonymous.
>          */             
>         if ((vm_flags & VM_UFFD_WP) && !vma_is_anonymous(vma))
>                 return false;
> #endif
>         return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) ||
>             vma_is_shmem(vma);
>   }

I'm confused.  Let me ask in another way: do you mean !VM_SHARED when you
say "private"?

Both shmem & hugetlb support private mappings for all three modes, afaict.

> > Well I never thought anyone would be using soft-dirty with uffd because
> > logically uffd-wp was somehow trying to replace it with a better interface,
> > hopefully.
> 
> I have heard about some who does (not me). So I do not make it up,
> unfortunately.

If in any form you can get the reason of using soft-dirty in that use case
please kindly share more.  I think it could be that uffd-wp is not working
as good as soft-dirty in some way but we can think about it when there's a
valid use case, so ultimately it's more possible for a full replacement.

Sync messages can be a problem and they're indeed slow, soft-dirty is by
nature async. But still I'd like to check in case you know.

> 
> > 
> > If to talk about it, IMHO the most important thing is really not the dirty
> > bit but the write bit: even if you leave both the dirty bits cleared (so
> > the user specified !WRITE_HINT then we grant write bit only), it means when
> > the write happens for sure dirty bit will be set on demand but not
> > soft-dirty since we won't generate a page fault at all. AFAICT it'll be a
> > false negative.
> > 
> > The old code is safe in that respect on that we always set dirty so we can
> > only have false positive (which is acceptable in this case) but not false
> > negative (which is not).
> 
> I agree that the PTE should be left RO in this case. I was just pointing
> that the user might not be aware of soft-dirty being used, and then it
> makes sense to treat the (lack of) write-hint appropriately.

Yes please.  Thanks for raising this topic btw, I never thought about that
side effect.

[...]

> > Another option is we call can_change_pte_writable() somehow in the
> > unprotect branch.
> 
> I don’t think that I got this one.

No worry, feel free to ignore it.  It's probably not ideal anyway because
we need an extra pte_mkwrite() there too when can_change_pte_writable()
returns true.

> tl;dr: I’ll run some numbers (access when PTE-dirty/clear), address the
> aforementioned issues and send v2. Based on the results we can decide
> whether it makes sense.

That'll be very appreciated.  Thanks a lot!

-- 
Peter Xu