Re: [PATCH 25/33] userfaultfd: shmem: add userfaultfd hook for shared memory faults

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 18, 2016 at 01:37:34AM +0100, Andrea Arcangeli wrote:
> Hello,
> 
> I found a minor issue with the non cooperative testcase, sometime an
> userfault would trigger in between UFFD_EVENT_MADVDONTNEED and
> UFFDIO_UNREGISTER:
> 
> 		case UFFD_EVENT_MADVDONTNEED:
> 			uffd_reg.range.start = msg.arg.madv_dn.start;
> 			uffd_reg.range.len = msg.arg.madv_dn.end -
> 				msg.arg.madv_dn.start;
> 			if (ioctl(uffd, UFFDIO_UNREGISTER, &uffd_reg.range))
> 
> It always triggered at the nr == 0:
> 
> 	for (nr = 0; nr < nr_pages; nr++) {
> 		if (my_bcmp(area_dst + nr * page_size, zeropage, page_size))
> 
> The userfault still pending after UFFDIO_UNREGISTER returned, lead to
> poll() getting a UFFD_EVENT_PAGEFAULT and trying to do a UFFDIO_COPY
> into the unregistered range, which gracefully results in -EINVAL.
> 
> So this could be all handled in userland, by storing the MADV_DONTNEED
> range and calling UFFDIO_WAKE instead of UFFDIO_COPY... but I think
> it's more reliable to fix it into the kernel.
> 
> If a pending userfault happens before UFFDIO_UNREGISTER it'll just
> behave like if it happened after.
> 
> I also noticed the order of uffd notification of MADV_DONTNEED and the
> pagetable zap was wrong, we've to notify userland first so it won't
> risk to call UFFDIO_COPY while the process runs zap_page_range.
> 
> With the two patches appended below the -EINVAL error out of
> UFFDIO_COPY is gone.
> 
> From fc27d209e566d95e8ae0eb83a703aa4e02316b4c Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Date: Thu, 17 Nov 2016 20:15:50 +0100
> Subject: [PATCH 1/2] userfaultfd: non-cooperative: avoid MADV_DONTNEED race
>  condition
> 
> MADV_DONTNEED must be notified to userland before the pages are
> zapped. This allows userland to immediately stop adding pages to the
> userfaultfd ranges before the pages are actually zapped or there could
> be non-zeropage leftovers as result of concurrent UFFDIO_COPY run in
> between zap_page_range and madvise_userfault_dontneed (both
> MADV_DONTNEED and UFFDIO_COPY runs under the mmap_sem for reading, so
> they can run concurrently).
> 
> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>

Acked-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx>

> ---
>  mm/madvise.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index 7168bc6..4d4c7f8 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -476,8 +476,8 @@ static long madvise_dontneed(struct vm_area_struct *vma,
>  	if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
>  		return -EINVAL;
> 
> -	zap_page_range(vma, start, end - start, NULL);
>  	madvise_userfault_dontneed(vma, prev, start, end);
> +	zap_page_range(vma, start, end - start, NULL);
>  	return 0;
>  }
> 
> 
> 
> From 18e7b30cf82c927af4c0323a6caac20184a03ff4 Mon Sep 17 00:00:00 2001
> From: Andrea Arcangeli <aarcange@xxxxxxxxxx>
> Date: Thu, 17 Nov 2016 20:20:40 +0100
> Subject: [PATCH 2/2] userfaultfd: non-cooperative: wake userfaults after
>  UFFDIO_UNREGISTER
> 
> Userfaults may still happen after the userfaultfd monitor thread
> received a UFFD_EVENT_MADVDONTNEED until UFFDIO_UNREGISTER is run.
> 
> Wake any pending userfault within UFFDIO_UNREGISTER protected by the
> mmap_sem for writing, so they will not be reported to userland leading
> to UFFDIO_COPY returning -EINVAL (as the range was already
> unregistered) and they will not hang permanently either.
> 
> Signed-off-by: Andrea Arcangeli <aarcange@xxxxxxxxxx>

Acked-by: Mike Rapoport <rppt@xxxxxxxxxxxxxxxxxx>

> ---
>  fs/userfaultfd.c | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
> index 2b75fab..42168d3 100644
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -1267,6 +1267,19 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx,
>  			start = vma->vm_start;
>  		vma_end = min(end, vma->vm_end);
> 
> +		if (userfaultfd_missing(vma)) {
> +			/*
> +			 * Wake any concurrent pending userfault while
> +			 * we unregister, so they will not hang
> +			 * permanently and it avoids userland to call
> +			 * UFFDIO_WAKE explicitly.
> +			 */
> +			struct userfaultfd_wake_range range;
> +			range.start = start;
> +			range.len = vma_end - start;
> +			wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range);
> +		}
> +
>  		new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
>  		prev = vma_merge(mm, prev, start, vma_end, new_flags,
>  				 vma->anon_vma, vma->vm_file, vma->vm_pgoff,
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]