> > export MALLOC_MMAP_THRESHOLD_=$[1024*1024*1024] > > export MALLOC_TOP_PAD_=$[1024*1024*1024] With the above two params I get around 200M (around half) in hugepages with gcc building translate.o: $ rm translate.o ; time make translate.o CC translate.o real 0m22.900s user 0m22.601s sys 0m0.260s $ rm translate.o ; time make translate.o CC translate.o real 0m22.405s user 0m22.125s sys 0m0.240s # echo never > /sys/kernel/mm/transparent_hugepage/enabled # exit $ rm translate.o ; time make translate.o CC translate.o real 0m24.128s user 0m23.725s sys 0m0.376s $ rm translate.o ; time make translate.o CC translate.o real 0m24.126s user 0m23.725s sys 0m0.376s $ uptime 02:36:07 up 1 day, 19:45, 5 users, load average: 0.01, 0.12, 0.08 1 sec in 24 means around 4% faster, hopefully when glibc will fully cooperate we'll get better results than the above with gcc... I tried to emulate it with khugepaged running in a loop and I get almost the whole gcc anon memory in hugepages this way (as expected): # echo 0 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs # exit rm translate.o ; time make translate.o CC translate.o real 0m21.950s user 0m21.481s sys 0m0.292s $ rm translate.o ; time make translate.o CC translate.o real 0m21.992s user 0m21.529s sys 0m0.288s $ So this takes more than 2 seconds away from 24 seconds reproducibly, and it means gcc now runs 8% faster. This requires running khugepaged at 100% of one of the four cores but with a slight chance to glibc we'll be able reach the exact same 8% speedup (or more because this also involves copying ~200M and sending IPIs to unmap pages and stop userland during the memory copy that won't be necessary anymore). BTW, the current default for khugepaged is to scan 8 pmd every 10 seconds, that means collapsing at most 16M every 10 seconds. Checking 8 pmd pointers every 10 seconds and 6 wakeup per minute for a kernel thread is absolutely unmeasurable but despite the unmeasurable overhead, it provides for a very nice behavior for long lived allocations that may have been swapped in fragmented. This is on phenom X4, I'd be interested if somebody can try on other cpus. To get the environment of the test just: git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git cd qemu-kvm make cd x86_64-softmmu export MALLOC_MMAP_THRESHOLD_=$[1024*1024*1024] export MALLOC_TOP_PAD_=$[1024*1024*1024] rm translate.o; time make translate.o Then you need to flip the above sysfs controls as I did. Thanks, Andrea -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxxx For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>