Re: [PATCH v2 05/24] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler

Peter Xu <peterx@xxxxxxxxxx> · Tue, 27 Apr 2021 14:54:10 -0400

On Tue, Apr 27, 2021 at 12:12:58PM -0400, Peter Xu wrote:
> File-backed memories are prone to unmap/swap so the ptes are always unstable.
> This could lead to userfaultfd-wp information got lost when unmapped or swapped
> out on such types of memory, for example, shmem.  To keep such an information
> persistent, we will start to use the newly introduced swap-like special ptes to
> replace a null pte when those ptes were removed.
> 
> Prepare this by handling such a special pte first before it is applied.  Here
> a new fault flag FAULT_FLAG_UFFD_WP is introduced.  When this flag is set, it

FAULT_FLAG_UFFD_WP does not exist any more.  Obviously I should have touched up
the commit message when touching up the code...

> means the current fault is to resolve a page access (either read or write) to
> the uffd-wp special pte.
> 
> The handling of this special pte page fault is similar to missing fault, but it
> should happen after the pte missing logic since the special pte is designed to
> be a swap-like pte.  Meanwhile it should be handled before do_swap_page() so
> that the swap core logic won't be confused to see such an illegal swap pte.
> 
> This is a slow path of uffd-wp handling, because unmap of wr-protected shmem
> ptes should be rare.  So far it should only trigger in two conditions:
> 
>   (1) When trying to punch holes in shmem_fallocate(), there will be a
>       pre-unmap optimization before evicting the page.  That will create
>       unmapped shmem ptes with wr-protected pages covered.
> 
>   (2) Swapping out of shmem pages
> 
> Because of this, the page fault handling is simplifed too by not sending the
> wr-protect message in the 1st page fault, instead the page will be installed
> read-only, so the message will be generated until the next do_wp_page() call.
> 
> Disable fault-around for such a special page fault, because the introduced new
> flag (FAULT_FLAG_UFFD_WP) only applies to current pte rather than all the pages

Same here.

> around it.  Doing fault-around with the new flag could confuse all the rest of
> pages when installing ptes from page cache when there's a cache hit.
> 
> Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
> ---
>  include/linux/userfaultfd_k.h | 11 +++++
>  mm/memory.c                   | 80 ++++++++++++++++++++++++++++++++---
>  2 files changed, 86 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index bc733512c6905..fefebe6e96560 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -89,6 +89,17 @@ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma)
>  	return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR);
>  }
>  
> +/*
> + * Don't do fault around for FAULT_FLAG_UFFD_WP because it means we want to

Same here...

> + * recover a previously wr-protected pte.  This flag is a per-pte information,
> + * so it could confuse all the pages around the current page when faulted in.
> + * Similar reason for MINOR mode faults.
> + */
> +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma)
> +{
> +	return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR);
> +}

-- 
Peter Xu