On Tue, 20 Nov 2012, Ingo Molnar wrote: > Reduce the 4K page fault count by looking around and processing > nearby pages if possible. > > To keep the logic and cache overhead simple and straightforward > we do a couple of simplifications: > > - we only scan in the HPAGE_SIZE range of the faulting address > - we only go as far as the vma allows us > > Also simplify the do_numa_page() flow while at it and fix the > previous double faulting we incurred due to not properly fixing > up freshly migrated ptes. > > Suggested-by: Mel Gorman <mgorman@xxxxxxx> > Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Rik van Riel <riel@xxxxxxxxxx> > Cc: Hugh Dickins <hughd@xxxxxxxxxx> > Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> Acked-by: David Rientjes <rientjes@xxxxxxxxxx> Ok, this is significantly better, it almost cut the regression in half on my system. With THP enabled: numa/core at ec05a2311c35: 136918.34 SPECjbb2005 bops numa/core at 01aa90068b12: 128315.19 SPECjbb2005 bops (-6.3%) numa/core at 01aa90068b12 + patch: 132523.06 SPECjbb2005 bops (-3.2%) Here's the newest perftop, which is radically different than before (not nearly the number of newly-added numa/core functions in the biggest consumers) but still incurs significant overhead from page faults. 92.18% perf-6697.map [.] 0x00007fe2c5afd079 1.20% libjvm.so [.] instanceKlass::oop_push_contents(PSPromotionManag 1.05% libjvm.so [.] PSPromotionManager::drain_stacks_depth(bool) 0.78% libjvm.so [.] PSPromotionManager::copy_to_survivor_space(oopDes 0.59% libjvm.so [.] PSPromotionManager::claim_or_forward_internal_dep 0.49% [kernel] [k] page_fault 0.27% libjvm.so [.] Copy::pd_disjoint_words(HeapWord*, HeapWord*, unsigned lo 0.27% libc-2.3.6.so [.] __gettimeofday 0.19% libjvm.so [.] CardTableExtension::scavenge_contents_parallel(ObjectStar 0.16% [kernel] [k] getnstimeofday 0.14% [kernel] [k] _raw_spin_lock 0.13% [kernel] [k] generic_smp_call_function_interrupt 0.11% [kernel] [k] ktime_get 0.11% [kernel] [k] rcu_check_callbacks 0.10% [kernel] [k] read_tsc 0.09% libjvm.so [.] os::javaTimeMillis() 0.09% [kernel] [k] clear_page_c 0.08% [kernel] [k] flush_tlb_func 0.08% [kernel] [k] ktime_get_update_offsets 0.07% [kernel] [k] task_tick_fair 0.06% [kernel] [k] emulate_vsyscall 0.06% libjvm.so [.] oopDesc::size_given_klass(Klass*) 0.06% [kernel] [k] __do_page_fault 0.04% [kernel] [k] __bad_area_nosemaphore 0.04% perf [.] 0x000000000003310b 0.04% libjvm.so [.] objArrayKlass::oop_push_contents(PSPromotionManager*, oop 0.04% [kernel] [k] run_timer_softirq 0.04% [kernel] [k] copy_user_generic_string 0.03% [kernel] [k] task_numa_fault 0.03% [kernel] [k] smp_call_function_many 0.03% [kernel] [k] retint_swapgs 0.03% [kernel] [k] update_cfs_shares 0.03% [kernel] [k] error_sti 0.03% [kernel] [k] _raw_spin_lock_irq 0.03% [kernel] [k] update_curr 0.02% [kernel] [k] write_ok_or_segv 0.02% [kernel] [k] call_function_interrupt 0.02% [kernel] [k] __do_softirq 0.02% [kernel] [k] acct_update_integrals 0.02% [kernel] [k] x86_pmu_disable_all 0.02% [kernel] [k] apic_timer_interrupt 0.02% [kernel] [k] tick_sched_timer -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>