On Mon, 2010-05-03 at 12:18 -0400, Rik van Riel wrote: > From: Rik van Riel <riel@xxxxxxxxxx> > > Both the page migration code and the transparent hugepage patches expect > 100% reliable rmap lookups and use page_lock_anon_vma(page) to prevent > races with mmap, munmap, expand_stack, etc. > > Specifically, try_to_unmap indirectly calls vma_address, which uses the > difference between vma->vm_start and vma->vm_pgoff, which can race when a > stack is expanded downwards. VMA splitting and merging present similar > issues. > > With the new anon_vma code, one VMA can be attached to multiple anon_vmas, > however mmap, munmap, expand_stack and others only took one anon_vma->lock. > This patch changes things so we take the anon_vma locks for all of the > anon_vmas attached to a VMA in the code that try_to_unmap would otherwise > race against: mmap, munmap, expand_stack, etc. > > Unfortunately, this leads to a lock ordering conflict with the page_table_lock, > which protected the "same_vma" list in the anon_vma_chain. Replacing that > lock with a new lock (mm->anon_vma_chain_lock), which is taken higher up in > the mm locking hierarchy, solves that issue. This changes the locking rules > for the "same_vma" list to be either mm->mmap_sem for write, or mm->mmap_sem > for read plus the new mm->anon_vma_chain lock. This limits the place where > the new lock is taken to 2 locations - anon_vma_prepare and expand_downwards. > > Document the locking rules for the same_vma list in the anon_vma_chain and > remove the anon_vma_lock call from expand_upwards, which does not need it. > diff --git a/mm/mmap.c b/mm/mmap.c > index 456ec6f..81850fc 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1705,12 +1707,11 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address) > return -EFAULT; > > /* > - * We must make sure the anon_vma is allocated > - * so that the anon_vma locking is not a noop. > + * Unlike expand_downwards, we do not need to take the anon_vma lock, > + * because we leave vma->vm_start and vma->pgoff untouched. > + * This means rmap lookups of pages inside this VMA stay valid > + * throughout the stack expansion. > */ > - if (unlikely(anon_vma_prepare(vma))) > - return -ENOMEM; > - anon_vma_lock(vma); > > /* > * vma->vm_start/vm_end cannot change under us because the caller > @@ -1721,7 +1722,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address) > if (address < PAGE_ALIGN(address+4)) > address = PAGE_ALIGN(address+4); > else { > - anon_vma_unlock(vma); > return -ENOMEM; > } > error = 0; > @@ -1737,7 +1737,6 @@ int expand_upwards(struct vm_area_struct *vma, unsigned long address) > if (!error) > vma->vm_end = address; > } > - anon_vma_unlock(vma); > return error; > } > #endif /* CONFIG_STACK_GROWSUP || CONFIG_IA64 */ This does leave me worrying about concurrent faults poking at vma->vm_end without synchronization. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>