Re: [PATCH 1/2] mm: Take all anon_vma locks in anon_vma_lock

Peter Zijlstra <peterz@xxxxxxxxxxxxx> · Mon, 03 May 2010 18:55:12 +0200

On Mon, 2010-05-03 at 12:18 -0400, Rik van Riel wrote:
> From: Rik van Riel <riel@xxxxxxxxxx>
> 
> Both the page migration code and the transparent hugepage patches expect
> 100% reliable rmap lookups and use page_lock_anon_vma(page) to prevent
> races with mmap, munmap, expand_stack, etc.
> 
> Specifically, try_to_unmap indirectly calls vma_address, which uses the
> difference between vma->vm_start and vma->vm_pgoff, which can race when a
> stack is expanded downwards.  VMA splitting and merging present similar
> issues.
> 
> With the new anon_vma code, one VMA can be attached to multiple anon_vmas,
> however mmap, munmap, expand_stack and others only took one anon_vma->lock.
> This patch changes things so we take the anon_vma locks for all of the
> anon_vmas attached to a VMA in the code that try_to_unmap would otherwise
> race against: mmap, munmap, expand_stack, etc. 
> 
> Unfortunately, this leads to a lock ordering conflict with the page_table_lock,
> which protected the "same_vma" list in the anon_vma_chain.  Replacing that
> lock with a new lock (mm->anon_vma_chain_lock), which is taken higher up in
> the mm locking hierarchy, solves that issue.  This changes the locking rules
> for the "same_vma" list to be either mm->mmap_sem for write, or mm->mmap_sem
> for read plus the new mm->anon_vma_chain lock.  This limits the place where
> the new lock is taken to 2 locations - anon_vma_prepare and expand_downwards.
> 
> Document the locking rules for the same_vma list in the anon_vma_chain and
> remove the anon_vma_lock call from expand_upwards, which does not need it.

> diff --git a/mm/mmap.c b/mm/mmap.c
> index 456ec6f..81850fc 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c

> @@ -1705,12 +1707,11 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  		return -EFAULT;
>  
>  	/*
> -	 * We must make sure the anon_vma is allocated
> -	 * so that the anon_vma locking is not a noop.
> +	 * Unlike expand_downwards, we do not need to take the anon_vma lock,
> +	 * because we leave vma->vm_start and vma->pgoff untouched. 
> +	 * This means rmap lookups of pages inside this VMA stay valid
> +	 * throughout the stack expansion.
>  	 */
> -	if (unlikely(anon_vma_prepare(vma)))
> -		return -ENOMEM;
> -	anon_vma_lock(vma);
>  
>  	/*
>  	 * vma->vm_start/vm_end cannot change under us because the caller
> @@ -1721,7 +1722,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  	if (address < PAGE_ALIGN(address+4))
>  		address = PAGE_ALIGN(address+4);
>  	else {
> -		anon_vma_unlock(vma);
>  		return -ENOMEM;
>  	}
>  	error = 0;
> @@ -1737,7 +1737,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address)
>  		if (!error)
>  			vma->vm_end = address;
>  	}
> -	anon_vma_unlock(vma);
>  	return error;
>  }
>  #endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */

This does leave me worrying about concurrent faults poking at
vma->vm_end without synchronization.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxxx  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>