Hi, mallocstress[1] LTP testcase takes ~5+ minutes to complete on some arm64 systems (e.g. 4 node, 64 CPU, 256GB RAM): real 7m58.089s user 0m0.513s sys 24m27.041s But if I turn off THP ("transparent_hugepage=never") it's a lot faster: real 0m4.185s user 0m0.298s sys 0m13.954s Perf suggests, that most time is spent in clear_page(). - 94.25% 94.24% mallocstress [kernel.kallsyms] [k] clear_page 94.24% thread_start start_thread alloc_mem allocate_free - malloc - 94.24% _int_malloc - 94.24% sysmalloc el0_da do_mem_abort do_translation_fault do_page_fault handle_mm_fault - __handle_mm_fault - 94.22% do_huge_pmd_anonymous_page - __do_huge_pmd_anonymous_page - 94.21% clear_huge_page clear_page Percent│ │ │ │ Disassembly of section load0: │ │ ffff0000087f0540 <load0>: 0.00 │ mrs x1, dczid_el0 0.00 │ and w1, w1, #0xf │ mov x2, #0x4 // #4 │ lsl x1, x2, x1 100.00 │10: dc zva, x0 │ add x0, x0, x1 │ tst x0, #0xffff │ ↑ b.ne 10 │ ← ret # uname -r 4.15.3 # grep HUGE -r .config CONFIG_CGROUP_HUGETLB=y CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y CONFIG_HAVE_ARCH_HUGE_VMAP=y CONFIG_SYS_SUPPORTS_HUGETLBFS=y CONFIG_TRANSPARENT_HUGEPAGE=y CONFIG_TRANSPARENT_HUGEPAGE_ALWAYS=y # CONFIG_TRANSPARENT_HUGEPAGE_MADVISE is not set CONFIG_TRANSPARENT_HUGE_PAGECACHE=y CONFIG_HUGETLBFS=y CONFIG_HUGETLB_PAGE=y # grep _PAGE -r .config CONFIG_ARM64_PAGE_SHIFT=16 CONFIG_PAGE_COUNTER=y CONFIG_HAVE_ALIGNED_STRUCT_PAGE=y # CONFIG_ARM64_4K_PAGES is not set # CONFIG_ARM64_16K_PAGES is not set CONFIG_ARM64_64K_PAGES=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_TRANSPARENT_HUGE_PAGECACHE=y CONFIG_IDLE_PAGE_TRACKING=y CONFIG_PROC_PAGE_MONITOR=y CONFIG_HUGETLB_PAGE=y CONFIG_ARCH_HAS_GIGANTIC_PAGE=y # CONFIG_PAGE_OWNER is not set # CONFIG_PAGE_EXTENSION is not set # CONFIG_DEBUG_PAGEALLOC is not set # CONFIG_PAGE_POISONING is not set # CONFIG_DEBUG_PAGE_REF is not set # cat /proc/meminfo | grep Huge Hugepagesize: 524288 kB # numactl -H available: 4 nodes (0-3) node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 node 0 size: 65308 MB node 0 free: 64892 MB node 1 cpus: 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 node 1 size: 65404 MB node 1 free: 62804 MB node 2 cpus: 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 node 2 size: 65404 MB node 2 free: 62847 MB node 3 cpus: 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 node 3 size: 65402 MB node 3 free: 64671 MB node distances: node 0 1 2 3 0: 10 15 20 20 1: 15 10 20 20 2: 20 20 10 15 3: 20 20 15 10 Regards, Jan [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/mem/mtest07/mallocstress.c -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href