Re: [PATCH 25/33] userfaultfd: shmem: add userfaultfd hook for shared memory faults

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

I found a minor issue with the non cooperative testcase, sometime an
userfault would trigger in between UFFD_EVENT_MADVDONTNEED and
UFFDIO_UNREGISTER:

		case UFFD_EVENT_MADVDONTNEED:
			uffd_reg.range.start = msg.arg.madv_dn.start;
			uffd_reg.range.len = msg.arg.madv_dn.end -
				msg.arg.madv_dn.start;
			if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_reg.range))

It always triggered at the nr == 0:

	for (nr = 0; nr < nr_pages; nr++) {
		if (my_bcmp(area_dst + nr * page_size, zeropage, page_size))

The userfault still pending after UFFDIO_UNREGISTER returned, lead to
poll() getting a UFFD_EVENT_PAGEFAULT and trying to do a UFFDIO_COPY
into the unregistered range, which gracefully results in -EINVAL.

So this could be all handled in userland, by storing the MADV_DONTNEED
range and calling UFFDIO_WAKE instead of UFFDIO_COPY... but I think
it's more reliable to fix it into the kernel.

If a pending userfault happens before UFFDIO_UNREGISTER it'll just
behave like if it happened after.

I also noticed the order of uffd notification of MADV_DONTNEED and the
pagetable zap was wrong, we've to notify userland first so it won't
risk to call UFFDIO_COPY while the process runs zap_page_range.

With the two patches appended below the -EINVAL error out of
UFFDIO_COPY is gone.

>From fc27d209e566d95e8ae0eb83a703aa4e02316b4c Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Date: Thu, 17 Nov 2016 20:15:50 +0100
Subject: [PATCH 1/2] userfaultfd: non-cooperative: avoid MADV_DONTNEED race
 condition

MADV_DONTNEED must be notified to userland before the pages are
zapped. This allows userland to immediately stop adding pages to the
userfaultfd ranges before the pages are actually zapped or there could
be non-zeropage leftovers as result of concurrent UFFDIO_COPY run in
between zap_page_range and madvise_userfault_dontneed (both
MADV_DONTNEED and UFFDIO_COPY runs under the mmap_sem for reading, so
they can run concurrently).

Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
---
 mm/madvise.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/madvise.c b/mm/madvise.c
index 7168bc6..4d4c7f8 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -476,8 +476,8 @@ static long madvise_dontneed(struct vm_area_struct *vma,
 	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
 		return -EINVAL;
 
-	zap_page_range(vma, start, end - start, NULL);
 	madvise_userfault_dontneed(vma, prev, start, end);
+	zap_page_range(vma, start, end - start, NULL);
 	return 0;
 }
 


>From 18e7b30cf82c927af4c0323a6caac20184a03ff4 Mon Sep 17 00:00:00 2001
From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
Date: Thu, 17 Nov 2016 20:20:40 +0100
Subject: [PATCH 2/2] userfaultfd: non-cooperative: wake userfaults after
 UFFDIO_UNREGISTER

Userfaults may still happen after the userfaultfd monitor thread
received a UFFD_EVENT_MADVDONTNEED until UFFDIO_UNREGISTER is run.

Wake any pending userfault within UFFDIO_UNREGISTER protected by the
mmap_sem for writing, so they will not be reported to userland leading
to UFFDIO_COPY returning -EINVAL (as the range was already
unregistered) and they will not hang permanently either.

Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>
---
 fs/userfaultfd.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 2b75fab..42168d3 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1267,6 +1267,19 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
 			start = vma->vm_start;
 		vma_end = min(end, vma->vm_end);
 
+		if (userfaultfd_missing(vma)) {
+			/*
+			 * Wake any concurrent pending userfault while
+			 * we unregister, so they will not hang
+			 * permanently and it avoids userland to call
+			 * UFFDIO_WAKE explicitly.
+			 */
+			struct userfaultfd_wake_range range;
+			range.start = start;
+			range.len = vma_end - start;
+			wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range);
+		}
+
 		new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
 		prev = vma_merge(mm, prev, start, vma_end, new_flags,
 				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]