On Thu, Apr 29, 2010 at 11:10 AM, Rik van Riel <riel@xxxxxxxxxx> wrote: > On 04/28/2010 08:28 PM, Minchan Kim wrote: >> >> On Thu, Apr 29, 2010 at 5:57 AM, Rik van Riel<riel@xxxxxxxxxx> wrote: >>> >>> Take all the locks for all the anon_vmas in anon_vma_lock, this properly >>> excludes migration and the transparent hugepage code from VMA changes >>> done >>> by mmap/munmap/mprotect/expand_stack/etc... >>> >>> Unfortunately, this requires adding a new lock (mm->anon_vma_chain_lock), >>> otherwise we have an unavoidable lock ordering conflict. This changes >>> the >>> locking rules for the "same_vma" list to be either mm->mmap_sem for >>> write, >>> or mm->mmap_sem for read plus the new mm->anon_vma_chain lock. This >>> limits >>> the place where the new lock is taken to 2 locations - anon_vma_prepare >>> and >>> expand_downwards. >>> >>> Document the locking rules for the same_vma list in the anon_vma_chain >>> and >>> remove the anon_vma_lock call from expand_upwards, which does not need >>> it. >>> >>> Signed-off-by: Rik van Riel<riel@xxxxxxxxxx> >> >> This patch makes things simple. So I like this. >> Actually, I wanted this all-at-once locks approach. >> But I was worried about that how the patch affects AIM 7 workload >> which is cause of anon_vma_chain about scalability by Rik. >> But now Rik himself is sending the patch. So I assume the patch >> couldn't decrease scalability of the workload heavily. > > The thing is, the number of anon_vmas attached to a VMA is > small (depth of the tree, so for apache or aim the typical > depth is 2). This N is between 1 and 3. > > The problem we had originally is the _width_ of the tree, > where every sibling process was attached to the same anon_vma > and the rmap code had to walk the page tables of all the > processes, for every privately owned page in each child process. > For large server workloads, this N is between a few hundred and > a few thousand. > > What matters most at this point is correctness - we need to be > able to exclude rmap walks when messing with a VMA in any way > that breaks lookups, because rmap walks for page migration and > hugepage conversion have to be 100% reliable. > > That is not a constraint I had in mind with the original > anon_vma changes, so the code needs to be fixed up now... Yes. I understand it. When you tried anon_vma_chain patches as I pointed out, what I have a concern is parent's vma not child's one. The vma of parent still has N anon_vma. AFAIR, you said it's trade-off and would be good than old at least. I agreed. But I just want to remind you because this makes worse. :) The corner case is that we have to hold locks of N. Do I miss something? Really, Can't we ignore that case latency although this happen infrequently? I am not against this patch. I just want to listen your opinion. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href