On Tue, Jun 14, 2011 at 6:21 PM, Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > Anyway, please check me if I'm wrong, but won't the "anon_vma->root" > be the same for all the anon_vma's that are associated with one > particular vma? > > The reason I ask [...] So here's a trial patch that moves the anon_vma locking one level up in the anon_vma_clone() call chain. It actually does allow the root to change, but has a WARN_ON_ONCE() if that ever happens. I *suspect* this will help the locking numbers a bit, but I'd like to note that it does this *only* for the anon_vma_clone() case, and the exact same thing should be done for the exit case too (ie the unlink_anon_vmas()). So if it does work it's still just one step on the way, and there would be more work along the same lines to possibly improve the locking further. The patch is "tested" in the sense that I booted the kernel and am running it right now (and compiled a kernel with it). But that's not a whole lot of actual real life testing, so caveat emptor. And I won't really even guarantee that the main problem locking-wise would be a long chain of "same_vma" anon-vma's that this does with just a single lock. So who knows - maybe it doesn't help at all. I suspect it's worth testing, though. Linus
mm/rmap.c | 18 ++++++++++++++++-- 1 files changed, 16 insertions(+), 2 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 0eb463ea88dd..206c3fb072af 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -208,13 +208,11 @@ static void anon_vma_chain_link(struct vm_area_struct *vma, avc->anon_vma = anon_vma; list_add(&avc->same_vma, &vma->anon_vma_chain); - anon_vma_lock(anon_vma); /* * It's critical to add new vmas to the tail of the anon_vma, * see comment in huge_memory.c:__split_huge_page(). */ list_add_tail(&avc->same_anon_vma, &anon_vma->head); - anon_vma_unlock(anon_vma); } /* @@ -224,16 +222,30 @@ static void anon_vma_chain_link(struct vm_area_struct *vma, int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src) { struct anon_vma_chain *avc, *pavc; + struct anon_vma *root = NULL; list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) { + struct anon_vma *anon_vma = pavc->anon_vma, *new_root = anon_vma->root; + + if (new_root != root) { + if (WARN_ON_ONCE(root)) + mutex_unlock(&root->mutex); + root = new_root; + mutex_lock(&root->mutex); + } + avc = anon_vma_chain_alloc(); if (!avc) goto enomem_failure; anon_vma_chain_link(dst, avc, pavc->anon_vma); } + if (root) + mutex_unlock(&root->mutex); return 0; enomem_failure: + if (root) + mutex_unlock(&root->mutex); unlink_anon_vmas(dst); return -ENOMEM; } @@ -280,7 +292,9 @@ int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma) get_anon_vma(anon_vma->root); /* Mark this anon_vma as the one where our new (COWed) pages go. */ vma->anon_vma = anon_vma; + anon_vma_lock(anon_vma); anon_vma_chain_link(vma, avc, anon_vma); + anon_vma_unlock(anon_vma); return 0;