On Thu, Apr 29, 2010 at 11:55 AM, Minchan Kim <minchan.kim@xxxxxxxxx> wrote: > On Thu, Apr 29, 2010 at 11:10 AM, Rik van Riel <riel@xxxxxxxxxx> wrote: >> On 04/28/2010 08:28 PM, Minchan Kim wrote: >>> >>> On Thu, Apr 29, 2010 at 5:57 AM, Rik van Riel<riel@xxxxxxxxxx> wrote: >>>> >>>> Take all the locks for all the anon_vmas in anon_vma_lock, this properly >>>> excludes migration and the transparent hugepage code from VMA changes >>>> done >>>> by mmap/munmap/mprotect/expand_stack/etc... >>>> >>>> Unfortunately, this requires adding a new lock (mm->anon_vma_chain_lock), >>>> otherwise we have an unavoidable lock ordering conflict. This changes >>>> the >>>> locking rules for the "same_vma" list to be either mm->mmap_sem for >>>> write, >>>> or mm->mmap_sem for read plus the new mm->anon_vma_chain lock. This >>>> limits >>>> the place where the new lock is taken to 2 locations - anon_vma_prepare >>>> and >>>> expand_downwards. >>>> >>>> Document the locking rules for the same_vma list in the anon_vma_chain >>>> and >>>> remove the anon_vma_lock call from expand_upwards, which does not need >>>> it. >>>> >>>> Signed-off-by: Rik van Riel<riel@xxxxxxxxxx> >>> >>> This patch makes things simple. So I like this. >>> Actually, I wanted this all-at-once locks approach. >>> But I was worried about that how the patch affects AIM 7 workload >>> which is cause of anon_vma_chain about scalability by Rik. >>> But now Rik himself is sending the patch. So I assume the patch >>> couldn't decrease scalability of the workload heavily. >> >> The thing is, the number of anon_vmas attached to a VMA is >> small (depth of the tree, so for apache or aim the typical >> depth is 2). This N is between 1 and 3. >> >> The problem we had originally is the _width_ of the tree, >> where every sibling process was attached to the same anon_vma >> and the rmap code had to walk the page tables of all the >> processes, for every privately owned page in each child process. >> For large server workloads, this N is between a few hundred and >> a few thousand. >> >> What matters most at this point is correctness - we need to be >> able to exclude rmap walks when messing with a VMA in any way >> that breaks lookups, because rmap walks for page migration and >> hugepage conversion have to be 100% reliable. >> >> That is not a constraint I had in mind with the original >> anon_vma changes, so the code needs to be fixed up now... > > Yes. I understand it. > > When you tried anon_vma_chain patches as I pointed out, what I have a > concern is parent's vma not child's one. > The vma of parent still has N anon_vma. > AFAIR, you said it's trade-off and would be good than old at least. > I agreed. But I just want to remind you because this makes worse. :) > The corner case is that we have to hold locks of N. > > Do I miss something? > Really, Can't we ignore that case latency although this happen infrequently? > I am not against this patch. I just want to listen your opinion. me/ slaps self. It's about height of tree and I can't imagine high height of scenario.(fork->fork->fork->...->fork) So as Rik pointed out, It's a not big overhead about latency latency, at least. I think. I supports this approach. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href