On Wed, Apr 28, 2010 at 01:10:07AM +0200, Andrea Arcangeli wrote: > On Tue, Apr 27, 2010 at 10:30:51PM +0100, Mel Gorman wrote: > > diff --git a/mm/mmap.c b/mm/mmap.c > > index f90ea92..61d6f1d 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -578,6 +578,9 @@ again: remove_next = 1 + (end > next->vm_end); > > } > > } > > > > + if (vma->anon_vma) > > + spin_lock(&vma->anon_vma->lock); > > + > > if (root) { > > flush_dcache_mmap_lock(mapping); > > vma_prio_tree_remove(vma, root); > > @@ -620,6 +623,9 @@ again: remove_next = 1 + (end > next->vm_end); > > if (mapping) > > spin_unlock(&mapping->i_mmap_lock); > > > > + if (vma->anon_vma) > > + spin_unlock(&vma->anon_vma->lock); > > + > > if (remove_next) { > > if (file) { > > fput(file); > > The old code did: > > /* > * When changing only vma->vm_end, we don't really need > * anon_vma lock. > */ > if (vma->anon_vma && (insert || importer || start != vma->vm_start)) > anon_vma = vma->anon_vma; > if (anon_vma) { > spin_lock(&anon_vma->lock); > > why did it become unconditional? (and no idea why it was removed) > It became unconditional because I wasn't sure of the optimisation versus the new anon_vma changes (doesn't matter, should have been safe). At the time the patch was introduced, the bug looked like a race in VMA's in the list having their details modified. I thought vma_address was returning -EFAULT when it shouldn't and while this may still be possible, it wasn't the prime cause of the bug. The more important race was in execve between when a VMA got moved and the page tables copied. The anon_vma locks are fine for the VMA move but the page table copy happens later. What the patch did was alter the timing of the race. rmap_walk() was finding the VMA of the new stack being set up by exec, failing to lock it and backing off. By the time it would restart and get back to that VMA, it was already moved making the bug simply harder to reproduce because the race window was so small. So, the VMA list does not appear to be messed up but there still needs to be protection against modification of VMA details that are already on the list. For that, the seq counter would have been enough and lighter-weight than acquiring the anon_vma->lock every time in vma_adjust(). I'll drop this patch again as the execve race looks the most important. > But I'm not sure about this part.... this is really only a question, I > may well be wrong, I just don't get it. > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>