Re: [PATCH 1/1] mm: disable CONFIG_PER_VMA_LOCK by default until its fixed

David Hildenbrand <david@xxxxxxxxxx> · Mon, 3 Jul 2023 22:30:00 +0200

On 03.07.23 20:21, Suren Baghdasaryan wrote:
A memory corruption was reported in [1] with bisection pointing to the
patch [2] enabling per-VMA locks for x86.
Disable per-VMA locks config to prevent this issue while the problem is
being investigated. This is expected to be a temporary measure.

[1] https://bugzilla.kernel.org/show_bug.cgi?id=217624
[2] https://lore.kernel.org/all/20230227173632.3292573-30-surenb@xxxxxxxxxx

Reported-by: Jiri Slaby <jirislaby@xxxxxxxxxx>
Reported-by: Jacob Young <jacobly.alt@xxxxxxxxx>
Fixes: 0bff0aaea03e ("x86/mm: try VMA lock-based page fault handling first")
Signed-off-by: Suren Baghdasaryan <surenb@xxxxxxxxxx>
---
  mm/Kconfig | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index 09130434e30d..de94b2497600 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1224,7 +1224,7 @@ config ARCH_SUPPORTS_PER_VMA_LOCK
         def_bool n
  
  config PER_VMA_LOCK
-	def_bool y
+	bool "Enable per-vma locking during page fault handling."
  	depends on ARCH_SUPPORTS_PER_VMA_LOCK && MMU && SMP
  	help
  	  Allow per-vma locking during page fault handling.

As raised at LSF/MM, I was "surprised" that we can now handle page faults
concurrent to fork() and was expecting something to be broken already.

What probably happens is that we wr-protected the page in the parent process and
COW-shared an anon page with the child using copy_present_pte().

But we only flush the parent MM tlb before we drop the parent MM lock in
dup_mmap().


If we get a write-fault before that TLB flush in the parent, and we end up
replacing that anon page in the parent process in do_wp_page() [because, COW-shared with the child],
this might be problematic: some stale writable TLB entries can target the wrong (old) page.


We had similar issues in the past with userfaultfd, see the comment at the beginning of do_wp_page():


	if (likely(!unshare)) {
		if (userfaultfd_pte_wp(vma, *vmf->pte)) {
			pte_unmap_unlock(vmf->pte, vmf->ptl);
			return handle_userfault(vmf, VM_UFFD_WP);
		}

		/*
		 * Userfaultfd write-protect can defer flushes. Ensure the TLB
		 * is flushed in this case before copying.
		 */
		if (unlikely(userfaultfd_wp(vmf->vma) &&
			     mm_tlb_flush_pending(vmf->vma->vm_mm)))
			flush_tlb_page(vmf->vma, vmf->address);
	}


We really should not allow page faults concurrent to fork() without further investigation.

--
Cheers,

David / dhildenb