Re: [PATCH v4 1/2] mm/uffd: UFFD_FEATURE_WP_UNPOPULATED

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




   (1) With huge page disabled
   echo madvise > /sys/kernel/mm/transparent_hugepage/enabled
   ./uffd_wp_perf
   Test DEFAULT: 4
   Test PRE-READ: 1111453 (pre-fault 1101011)
   Test MADVISE: 278276 (pre-fault 266378)

Thinking about it, I guess the biggest slowdown here is the "one fake pagefault at a time" handling.

   Test WP-UNPOPULATE: 11712

   (2) With Huge page enabled
   echo always > /sys/kernel/mm/transparent_hugepage/enabled
   ./uffd_wp_perf
   Test DEFAULT: 4
   Test PRE-READ: 22521 (pre-fault 22348)
   Test MADVISE: 4909 (pre-fault 4743)
   Test WP-UNPOPULATE: 14448

There'll be a great perf boost for no-thp case, while for thp enabled with
extreme case of all-thp-zero WP_UNPOPULATED can be slower than MADVISE, but
that's low possibility in reality, also the overhead was not reduced but
postponed until a follow up write on any huge zero thp, so potentially it
is faster by making the follow up writes slower.

[1] https://lore.kernel.org/all/20210401092226.102804-4-andrey.gruzdev@xxxxxxxxxxxxx/
[2] https://lore.kernel.org/all/Y+v2HJ8+3i%2FKzDBu@x1n/
[3] https://lore.kernel.org/all/d0eb0a13-16dc-1ac1-653a-78b7273781e3@xxxxxxxxxxxxx/
[4] https://github.com/xzpeter/clibs/blob/master/uffd-test/uffd-wp-perf.c

Signed-off-by: Peter Xu <peterx@xxxxxxxxxx>
---
  Documentation/admin-guide/mm/userfaultfd.rst | 17 ++++++
  fs/userfaultfd.c                             | 16 ++++++
  include/linux/mm_inline.h                    |  6 +++
  include/linux/userfaultfd_k.h                | 23 ++++++++
  include/uapi/linux/userfaultfd.h             | 10 +++-
  mm/memory.c                                  | 56 +++++++++++++++-----
  mm/mprotect.c                                | 51 ++++++++++++++----
  7 files changed, 154 insertions(+), 25 deletions(-)

diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst
index 7dc823b56ca4..c86b56c95ea6 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -219,6 +219,23 @@ former will have ``UFFD_PAGEFAULT_FLAG_WP`` set, the latter
  you still need to supply a page when ``UFFDIO_REGISTER_MODE_MISSING`` was
  used.
+Userfaultfd write-protect mode currently behave differently on none ptes
+(when e.g. page is missing) over different types of memories.
+
+For anonymous memory, ``ioctl(UFFDIO_WRITEPROTECT)`` will ignore none ptes
+(e.g. when pages are missing and not populated).  For file-backed memories
+like shmem and hugetlbfs, none ptes will be write protected just like a
+present pte.  In other words, there will be a userfaultfd write fault
+message generated when writting to a missing page on file typed memories,

s/writting/writing/

+as long as the page range was write-protected before.  Such a message will
+not be generated on anonymous memories by default.
+
+If the application wants to be able to write protect none ptes on anonymous
+memory, one can pre-populate the memory with e.g. MADV_POPULATE_READ.  On
+newer kernels, one can also detect the feature UFFD_FEATURE_WP_UNPOPULATED
+and set the feature bit in advance to make sure none ptes will also be
+write protected even upon anonymous memory.
+

[...]

  /*
   * A number of key systems in x86 including ioremap() rely on the assumption
@@ -1350,6 +1364,10 @@ zap_install_uffd_wp_if_needed(struct vm_area_struct *vma,
  			      unsigned long addr, pte_t *pte,
  			      struct zap_details *details, pte_t pteval)
  {
+	/* Zap on anonymous always means dropping everything */
+	if (vma_is_anonymous(vma))
+		return;
+
  	if (zap_drop_file_uffd_wp(details))
  		return;
@@ -1456,8 +1474,12 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
  				continue;
  			rss[mm_counter(page)]--;
  		} else if (pte_marker_entry_uffd_wp(entry)) {
-			/* Only drop the uffd-wp marker if explicitly requested */
-			if (!zap_drop_file_uffd_wp(details))
+			/*
+			 * For anon: always drop the marker; for file: only
+			 * drop the marker if explicitly requested.
+			 */

So MADV_DONTNEED a pte marker in an anonymous VMA will always remove that marker. Is that the same handling as for MADV_DONTNEED on shmem or on fallocate(PUNCHHOLE) on shmem?

+			if (!vma_is_anonymous(vma) &&
+			    !zap_drop_file_uffd_wp(details))
  				continue;

Maybe it would be nicer to have a zap_drop_uffd_wp_marker(vma, details) and have the comment in there. Especially because of the other hunk above.

So zap_drop_file_uffd_wp(details) -> zap_drop_uffd_wp_marker(vma, details) and move the anon handling + comment in there.


  		} else if (is_hwpoison_entry(entry) ||
  			   is_swapin_error_entry(entry)) {
@@ -3624,6 +3646,14 @@ static vm_fault_t pte_marker_clear(struct vm_fault *vmf)
  	return 0;
  }
+static vm_fault_t do_pte_missing(struct vm_fault *vmf)
+{
+	if (vma_is_anonymous(vmf->vma))
+		return do_anonymous_page(vmf);
+	else
+		return do_fault(vmf);

No need for the "else" statement.

+}
+
  /*
   * This is actually a page-missing access, but with uffd-wp special pte
   * installed.  It means this pte was wr-protected before being unmapped.
@@ -3634,11 +3664,10 @@ static vm_fault_t pte_marker_handle_uffd_wp(struct vm_fault *vmf)
  	 * Just in case there're leftover special ptes even after the region
  	 * got unregistered - we can simply clear them.
  	 */
-	if (unlikely(!userfaultfd_wp(vmf->vma) || vma_is_anonymous(vmf->vma)))
+	if (unlikely(!userfaultfd_wp(vmf->vma)))
  		return pte_marker_clear(vmf);
- /* do_fault() can handle pte markers too like none pte */
-	return do_fault(vmf);
+	return do_pte_missing(vmf);
  }

[...]

diff --git a/mm/mprotect.c b/mm/mprotect.c
index 231929f119d9..455f7051098f 100644
--- a/mm/mprotect.c
+++ b/mm/mprotect.c
@@ -276,7 +276,15 @@ static long change_pte_range(struct mmu_gather *tlb,
  		} else {
  			/* It must be an none page, or what else?.. */
  			WARN_ON_ONCE(!pte_none(oldpte));
-			if (unlikely(uffd_wp && !vma_is_anonymous(vma))) {
+
+			/*
+			 * Nobody plays with any none ptes besides
+			 * userfaultfd when applying the protections.
+			 */
+			if (likely(!uffd_wp))
+				continue;
+
+			if (userfaultfd_wp_use_markers(vma)) {
  				/*
  				 * For file-backed mem, we need to be able to
  				 * wr-protect a none pte, because even if the
@@ -320,23 +328,46 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd)
  	return 0;
  }
-/* Return true if we're uffd wr-protecting file-backed memory, or false */
+/*
+ * Return true if we want to split huge thps in change protection

"huge thps" sounds redundant. "if we want to PTE-map a huge PMD" ?

+ * procedure, false otherwise.


In general,

Acked-by: David Hildenbrand <david@xxxxxxxxxx>

--
Thanks,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux