Re: [BUG] kernel BUG at fs/userfaultfd.c:385 after 04f5866e41fb

Andrea Arcangeli <aarcange@xxxxxxxxxx> · Mon, 19 Aug 2019 12:05:17 -0400

On Wed, Aug 14, 2019 at 05:41:02PM +0200, Oleg Nesterov wrote:
> --- a/fs/userfaultfd.c
> +++ b/fs/userfaultfd.c
> @@ -880,6 +880,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
>  	/* len == 0 means wake all */
>  	struct userfaultfd_wake_range range = { .len = 0, };
>  	unsigned long new_flags;
> +	bool xxx;
>  
>  	WRITE_ONCE(ctx->released, true);
>  
> @@ -895,8 +896,7 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
>  	 * taking the mmap_sem for writing.
>  	 */
>  	down_write(&mm->mmap_sem);
> -	if (!mmget_still_valid(mm))
> -		goto skip_mm;
> +	xxx = mmget_still_valid(mm);
>  	prev = NULL;
>  	for (vma = mm->mmap; vma; vma = vma->vm_next) {
>  		cond_resched();
> @@ -907,19 +907,20 @@ static int userfaultfd_release(struct inode *inode, struct file *file)
>  			continue;
>  		}
>  		new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP);
> -		prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
> -				 new_flags, vma->anon_vma,
> -				 vma->vm_file, vma->vm_pgoff,
> -				 vma_policy(vma),
> -				 NULL_VM_UFFD_CTX);
> -		if (prev)
> -			vma = prev;
> -		else
> -			prev = vma;
> +		if (xxx) {
> +			prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end,
> +					 new_flags, vma->anon_vma,
> +					 vma->vm_file, vma->vm_pgoff,
> +					 vma_policy(vma),
> +					 NULL_VM_UFFD_CTX);
> +			if (prev)
> +				vma = prev;
> +			else
> +				prev = vma;
> +		}
>  		vma->vm_flags = new_flags;
>  		vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX;
>  	}
> -skip_mm:
>  	up_write(&mm->mmap_sem);
>  	mmput(mm);
>  wakeup:

The proposed fix looks correct, can you resend in a way that can be merged?

What happens is there are 4 threads, the uffdio copy with NULL source
address is just to induce more thread creation, then one thread does
UFFDIO_COPY with source in the uffd region so it blocks in
handle_userfault inside UFFDIO_COPY. When one of the threads then does
the illegal instruction the core dump starts. The core dump wakes the
userfault and the copy-user in UFFDIO_COPY is being retried after
userfaultfd_release already run because one of the other threads
already went through do_exit. It's a bit strange that the file that
was opened by the ioctl() syscall gets released and its
file->private_data destroyed before the ioctl syscall has a chance to
return to userland.

Anyway the same race condition can still happen for a rogue page fault
that is happening when the core dump start so the above fix is needed
anyway.