On Wed, Jun 26, 2013 at 03:37:15PM +0200, Ingo Molnar wrote: > > * Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > > On Wed, 26 Jun 2013 11:22:48 +0200 Ingo Molnar <mingo@xxxxxxxxxx> wrote: > > > > > except that on 32 TB > > > systems we don't spend ~2 hours initializing 8,589,934,592 page heads. > > > > That's about a million a second which is crazy slow - even my > > prehistoric desktop is 100x faster than that. > > > > Where's all this time actually being spent? > > See the earlier part of the thread - apparently it's spent initializing > the page heads - remote NUMA node misses from a single boot CPU, going > across a zillion cross-connects? I guess there's some other low hanging > fruits as well - so making this easier to profile would be nice. The > profile posted was not really usable. > That is correct, from what I am seeing, using crude cycle counters, there is far more time spent on the later nodes, i.e. memory near the boot node is initialized a lot faster then remote memory. I think the other low hanging fruits are currently being drowned out by the lack of locality. Nate > Btw., NUMA locality would be another advantage of on-demand > initialization: actual users of RAM tend to allocate node-local > (especially on large clusters), so any overhead will be naturally lower. > > Thanks, > > Ingo -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html