On Thu, May 06, 2010 at 07:12:59PM -0700, Linus Torvalds wrote: > > > On Fri, 7 May 2010, KAMEZAWA Hiroyuki wrote: > > > > IIUC, move_page_tables() may call "page table allocation" and it cannot be > > done under spinlock. > > Bah. It only does a "alloc_new_pmd()", and we could easily move that out > of the loop and pre-allocate the pmd's. > > If that's the only reason, then it's a really weak one, methinks. > It turns out not to be easy to the preallocating of PUDs, PMDs and PTEs move_page_tables() needs. To avoid overallocating, it has to follow the same logic as move_page_tables duplicating some code in the process. The ugliest aspect of all is passing those pre-allocated pages back into move_page_tables where they need to be passed down to such functions as __pte_alloc. It turns extremely messy. I stopped working on it about half way through as it was already too ugly to live and would have similar cost to Kamezawa's much more straight-forward approach of using move_vma(). While using move_vma is straight-forward and solves the problem, it's not as cheap as Andrea's solution. Andrea allocates a temporary VMA and puts it on a list and very little else. It didn't show up any problems in microbenchmarks. Calling move_vma does a lot more work particularly in copy_vma and this slows down exec. With Kamezawa's patch, kernbench was fine on wall time but in System Time, it slowed by up 1.48% in comparison to Andrea's slowing up by 0.64%[1]. aim9 was slowed as well. Kamezawa's slowed by 2.77% where Andrea's reported faster by 2.58%. While AIM9 is flaky and these figures are barely outside the noise, calling move_vma() is obviously more expensive. While my solution at http://lkml.org/lkml/2010/4/30/198 is cheapest as it does not touch exec() at all, is_vma_temporary_stack() could be broken in the future if any of the assumptions it makes change. So what you have is an inverse relationship between magic and performance. Mine has the most magic and is fastest. Kamezawa's has the least magic but slowest and Andrea has the goldilocks factor. Which do you prefer? [1] One caveat of the performance tests was that a lot of debugging such as lockdep was enabled. Disabling these would give different results but it should still be the case that calling move_vma is more expensive than calling kmem_cache_alloc. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>