On Fri, Nov 18, 2016 at 01:37:34AM +0100, Andrea Arcangeli wrote: > Hello, > > I found a minor issue with the non cooperative testcase, sometime an > userfault would trigger in between UFFD_EVENT_MADVDONTNEED and > UFFDIO_UNREGISTER: > > case UFFD_EVENT_MADVDONTNEED: > uffd_reg.range.start = msg.arg.madv_dn.start; > uffd_reg.range.len = msg.arg.madv_dn.end - > msg.arg.madv_dn.start; > if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_reg.range)) > > It always triggered at the nr == 0: > > for (nr = 0; nr < nr_pages; nr++) { > if (my_bcmp(area_dst + nr * page_size, zeropage, page_size)) > > The userfault still pending after UFFDIO_UNREGISTER returned, lead to > poll() getting a UFFD_EVENT_PAGEFAULT and trying to do a UFFDIO_COPY > into the unregistered range, which gracefully results in -EINVAL. > > So this could be all handled in userland, by storing the MADV_DONTNEED > range and calling UFFDIO_WAKE instead of UFFDIO_COPY... but I think > it's more reliable to fix it into the kernel. > > If a pending userfault happens before UFFDIO_UNREGISTER it'll just > behave like if it happened after. > > I also noticed the order of uffd notification of MADV_DONTNEED and the > pagetable zap was wrong, we've to notify userland first so it won't > risk to call UFFDIO_COPY while the process runs zap_page_range. > > With the two patches appended below the -EINVAL error out of > UFFDIO_COPY is gone. > > From fc27d209e566d95e8ae0eb83a703aa4e02316b4c Mon Sep 17 00:00:00 2001 > From: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Date: Thu, 17 Nov 2016 20:15:50 +0100 > Subject: [PATCH 1/2] userfaultfd: non-cooperative: avoid MADV_DONTNEED race > condition > > MADV_DONTNEED must be notified to userland before the pages are > zapped. This allows userland to immediately stop adding pages to the > userfaultfd ranges before the pages are actually zapped or there could > be non-zeropage leftovers as result of concurrent UFFDIO_COPY run in > between zap_page_range and madvise_userfault_dontneed (both > MADV_DONTNEED and UFFDIO_COPY runs under the mmap_sem for reading, so > they can run concurrently). > > Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> Acked-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> > --- > mm/madvise.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/madvise.c b/mm/madvise.c > index 7168bc6..4d4c7f8 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -476,8 +476,8 @@ static long madvise_dontneed(struct vm_area_struct *vma, > if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)) > return -EINVAL; > > - zap_page_range(vma, start, end - start, NULL); > madvise_userfault_dontneed(vma, prev, start, end); > + zap_page_range(vma, start, end - start, NULL); > return 0; > } > > > > From 18e7b30cf82c927af4c0323a6caac20184a03ff4 Mon Sep 17 00:00:00 2001 > From: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Date: Thu, 17 Nov 2016 20:20:40 +0100 > Subject: [PATCH 2/2] userfaultfd: non-cooperative: wake userfaults after > UFFDIO_UNREGISTER > > Userfaults may still happen after the userfaultfd monitor thread > received a UFFD_EVENT_MADVDONTNEED until UFFDIO_UNREGISTER is run. > > Wake any pending userfault within UFFDIO_UNREGISTER protected by the > mmap_sem for writing, so they will not be reported to userland leading > to UFFDIO_COPY returning -EINVAL (as the range was already > unregistered) and they will not hang permanently either. > > Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx> Acked-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx> > --- > fs/userfaultfd.c | 13 +++++++++++++ > 1 file changed, 13 insertions(+) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 2b75fab..42168d3 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -1267,6 +1267,19 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, > start = vma->vm_start; > vma_end = min(end, vma->vm_end); > > + if (userfaultfd_missing(vma)) { > + /* > + * Wake any concurrent pending userfault while > + * we unregister, so they will not hang > + * permanently and it avoids userland to call > + * UFFDIO_WAKE explicitly. > + */ > + struct userfaultfd_wake_range range; > + range.start = start; > + range.len = vma_end - start; > + wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range); > + } > + > new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP); > prev = vma_merge(mm, prev, start, vma_end, new_flags, > vma->anon_vma, vma->vm_file, vma->vm_pgoff, > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>