This is now under development patch (and I can't guarantee this is free from bug.) The idea is coalescing multiple css_get/put to __css_get(),__css_put() as we now do in res_counter charging. Here is a result with multi-threaded page fault program. The program does continuous page fault in 60 sec. If the kernel works better, we can see more page faults. Here is a test result under a memcg(not root cgroup). [before Patch] [root@bluextal test]# /root/bin/perf stat -e page-faults,cache-misses ./multi-fault-all-split 8 Performance counter stats for './multi-fault-all-split 8': 12357708 page-faults 161332057 cache-misses 60.007931275 seconds time elapsed 25.31% multi-fault-all [kernel.kallsyms] [k] clear_page_c 9.24% multi-fault-all [kernel.kallsyms] [k] down_read_trylock 8.37% multi-fault-all [kernel.kallsyms] [k] try_get_mem_cgroup_from_mm 5.21% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask 5.13% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave 4.91% multi-fault-all [kernel.kallsyms] [k] __css_put 4.66% multi-fault-all [kernel.kallsyms] [k] up_read 3.17% multi-fault-all [kernel.kallsyms] [k] css_put 2.77% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq 2.58% multi-fault-all [kernel.kallsyms] [k] page_fault [after Patch] [root@bluextal test]# /root/bin/perf stat -e page-faults,cache-misses ./multi-fault-all-split 8 Performance counter stats for './multi-fault-all-split 8': 13615258 page-faults 153207110 cache-misses 60.004117823 seconds time elapsed # Overhead Command Shared Object Symbol # ........ ............... ..................... ...... # 27.70% multi-fault-all [kernel.kallsyms] [k] clear_page_c 11.18% multi-fault-all [kernel.kallsyms] [k] down_read_trylock 7.54% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irqsave 5.99% multi-fault-all [kernel.kallsyms] [k] up_read 5.90% multi-fault-all [kernel.kallsyms] [k] __alloc_pages_nodemask 5.13% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock_irq 2.73% multi-fault-all [kernel.kallsyms] [k] __mem_cgroup_commit_charge 2.71% multi-fault-all [kernel.kallsyms] [k] page_fault 2.66% multi-fault-all [kernel.kallsyms] [k] handle_mm_fault 2.35% multi-fault-all [kernel.kallsyms] [k] _raw_spin_lock You can see cache-miss/page-faults is improved and no css_get/css_put in overhead stat record. Please give me your review if interested. (I tried to get rid of css_get()/put() per a page ...but..it seems no very easy. So, now trying to reduce overheads.) Thanks, -Kame -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>