Hello, kernel test robot noticed a 5.2% improvement of stress-ng.seal.ops_per_sec on: commit: 306c4ac9896b07b8872293eb224058ff83f81fac ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: stress-ng test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory parameters: nr_threads: 100% testtime: 60s test: seal cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240725/202407251553.12f35198-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/seal/stress-ng/60s commit: 844776cb65 ("mm/slub: mark racy access on slab->freelist") 306c4ac989 ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order") 844776cb65a77ef2 306c4ac9896b07b8872293eb224 ---------------- --------------------------- %stddev %change %stddev \ | \ 2.51 ± 27% +1.9 4.44 ± 35% mpstat.cpu.all.idle% 975100 ± 19% +29.5% 1262643 ± 16% numa-meminfo.node1.AnonPages.max 187.06 ± 4% -11.5% 165.63 ± 10% sched_debug.cfs_rq:/.runnable_avg.stddev 0.05 ± 18% -40.0% 0.03 ± 58% vmstat.procs.b 58973718 +5.2% 62024061 stress-ng.seal.ops 982893 +5.2% 1033732 stress-ng.seal.ops_per_sec 59045344 +5.2% 62095668 stress-ng.time.minor_page_faults 174957 +1.4% 177400 proc-vmstat.nr_slab_unreclaimable 63634761 +5.5% 67148443 proc-vmstat.numa_hit 63399995 +5.5% 66914221 proc-vmstat.numa_local 73601172 +6.1% 78073549 proc-vmstat.pgalloc_normal 59870250 +5.3% 63063514 proc-vmstat.pgfault 72718474 +6.0% 77106313 proc-vmstat.pgfree 1.983e+10 +1.3% 2.01e+10 perf-stat.i.branch-instructions 66023349 +5.6% 69728143 perf-stat.i.cache-misses 2.023e+08 +4.7% 2.117e+08 perf-stat.i.cache-references 7.22 -1.9% 7.08 perf-stat.i.cpi 9738 -5.6% 9196 perf-stat.i.cycles-between-cache-misses 8.799e+10 +1.6% 8.939e+10 perf-stat.i.instructions 0.14 +1.6% 0.14 perf-stat.i.ipc 8.71 +5.1% 9.16 perf-stat.i.metric.K/sec 983533 +4.7% 1029816 perf-stat.i.minor-faults 983533 +4.7% 1029816 perf-stat.i.page-faults 7.30 -18.4% 5.96 ± 44% perf-stat.overall.cpi 9735 -21.3% 7658 ± 44% perf-stat.overall.cycles-between-cache-misses 0.52 +0.1 0.62 ± 7% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64 0.56 +0.1 0.67 ± 7% perf-profile.calltrace.cycles-pp.ftruncate64 0.34 ± 70% +0.3 0.60 ± 7% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64 48.29 +0.6 48.86 perf-profile.calltrace.cycles-pp.__close 48.27 +0.6 48.84 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 48.27 +0.6 48.84 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close 48.26 +0.6 48.83 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 0.00 +0.6 0.58 ± 7% perf-profile.calltrace.cycles-pp.__x64_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64 48.21 +0.6 48.80 perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 48.03 +0.6 48.68 perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe 48.02 +0.6 48.66 perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.__x64_sys_close.do_syscall_64 47.76 +0.7 48.47 perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close 47.19 +0.7 47.92 perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput 47.11 +0.8 47.88 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput 0.74 -0.3 0.48 ± 8% perf-profile.children.cycles-pp.__munmap 0.69 -0.2 0.44 ± 9% perf-profile.children.cycles-pp.__x64_sys_munmap 0.68 -0.2 0.44 ± 9% perf-profile.children.cycles-pp.__vm_munmap 0.68 -0.2 0.45 ± 9% perf-profile.children.cycles-pp.do_vmi_munmap 0.65 -0.2 0.42 ± 8% perf-profile.children.cycles-pp.do_vmi_align_munmap 0.44 -0.2 0.28 ± 7% perf-profile.children.cycles-pp.unmap_region 0.48 -0.1 0.36 ± 7% perf-profile.children.cycles-pp.asm_exc_page_fault 0.42 -0.1 0.32 ± 7% perf-profile.children.cycles-pp.do_user_addr_fault 0.42 ± 2% -0.1 0.32 ± 7% perf-profile.children.cycles-pp.exc_page_fault 0.38 ± 2% -0.1 0.29 ± 7% perf-profile.children.cycles-pp.handle_mm_fault 0.35 ± 2% -0.1 0.27 ± 7% perf-profile.children.cycles-pp.__handle_mm_fault 0.33 ± 2% -0.1 0.26 ± 6% perf-profile.children.cycles-pp.do_fault 0.21 ± 2% -0.1 0.14 ± 8% perf-profile.children.cycles-pp.lru_add_drain 0.22 -0.1 0.15 ± 11% perf-profile.children.cycles-pp.alloc_inode 0.21 ± 2% -0.1 0.15 ± 9% perf-profile.children.cycles-pp.lru_add_drain_cpu 0.18 ± 2% -0.1 0.12 ± 8% perf-profile.children.cycles-pp.unmap_vmas 0.21 ± 2% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.folio_batch_move_lru 0.17 -0.1 0.11 ± 8% perf-profile.children.cycles-pp.unmap_page_range 0.16 ± 2% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.zap_pte_range 0.16 ± 2% -0.1 0.10 ± 9% perf-profile.children.cycles-pp.zap_pmd_range 0.26 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.shmem_fault 0.50 -0.1 0.45 ± 8% perf-profile.children.cycles-pp.mmap_region 0.26 ± 2% -0.1 0.20 ± 7% perf-profile.children.cycles-pp.__do_fault 0.26 -0.1 0.21 ± 6% perf-profile.children.cycles-pp.shmem_get_folio_gfp 0.19 ± 2% -0.1 0.14 ± 14% perf-profile.children.cycles-pp.write 0.22 ± 3% -0.0 0.18 ± 5% perf-profile.children.cycles-pp.shmem_alloc_and_add_folio 0.11 ± 4% -0.0 0.07 ± 10% perf-profile.children.cycles-pp.mas_store_gfp 0.16 ± 2% -0.0 0.12 ± 11% perf-profile.children.cycles-pp.mas_wr_store_entry 0.14 -0.0 0.10 ± 10% perf-profile.children.cycles-pp.mas_wr_node_store 0.08 -0.0 0.04 ± 45% perf-profile.children.cycles-pp.msync 0.06 -0.0 0.02 ± 99% perf-profile.children.cycles-pp.mas_find 0.12 ± 4% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.inode_init_always 0.10 ± 3% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.shmem_alloc_inode 0.16 -0.0 0.13 ± 9% perf-profile.children.cycles-pp.__x64_sys_fcntl 0.11 ± 4% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.shmem_file_write_iter 0.10 ± 4% -0.0 0.08 ± 8% perf-profile.children.cycles-pp.do_fcntl 0.15 -0.0 0.13 ± 8% perf-profile.children.cycles-pp.destroy_inode 0.16 ± 3% -0.0 0.14 ± 7% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 0.22 ± 3% -0.0 0.20 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 0.08 -0.0 0.06 ± 11% perf-profile.children.cycles-pp.___slab_alloc 0.15 ± 3% -0.0 0.12 ± 8% perf-profile.children.cycles-pp.__destroy_inode 0.07 ± 7% -0.0 0.04 ± 45% perf-profile.children.cycles-pp.__call_rcu_common 0.13 ± 2% -0.0 0.11 ± 8% perf-profile.children.cycles-pp.perf_event_mmap 0.09 -0.0 0.07 ± 9% perf-profile.children.cycles-pp.memfd_fcntl 0.06 -0.0 0.04 ± 44% perf-profile.children.cycles-pp.native_irq_return_iret 0.08 ± 6% -0.0 0.06 ± 8% perf-profile.children.cycles-pp.shmem_add_to_page_cache 0.12 -0.0 0.10 ± 6% perf-profile.children.cycles-pp.perf_event_mmap_event 0.11 ± 3% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.__lruvec_stat_mod_folio 0.10 -0.0 0.08 ± 8% perf-profile.children.cycles-pp.uncharge_batch 0.12 ± 4% -0.0 0.10 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64 0.05 +0.0 0.07 ± 5% perf-profile.children.cycles-pp.__d_alloc 0.05 +0.0 0.07 ± 10% perf-profile.children.cycles-pp.d_alloc_pseudo 0.07 +0.0 0.09 ± 7% perf-profile.children.cycles-pp.file_init_path 0.06 ± 6% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.security_file_alloc 0.07 ± 7% +0.0 0.09 ± 7% perf-profile.children.cycles-pp.errseq_sample 0.04 ± 44% +0.0 0.07 ± 10% perf-profile.children.cycles-pp.apparmor_file_alloc_security 0.09 +0.0 0.12 ± 5% perf-profile.children.cycles-pp.init_file 0.15 +0.0 0.18 ± 7% perf-profile.children.cycles-pp.common_perm_cond 0.15 ± 3% +0.0 0.19 ± 8% perf-profile.children.cycles-pp.security_file_truncate 0.20 +0.0 0.24 ± 7% perf-profile.children.cycles-pp.notify_change 0.06 +0.0 0.10 ± 6% perf-profile.children.cycles-pp.inode_init_owner 0.13 +0.0 0.18 ± 5% perf-profile.children.cycles-pp.alloc_empty_file 0.10 +0.1 0.16 ± 7% perf-profile.children.cycles-pp.clear_nlink 0.47 +0.1 0.56 ± 7% perf-profile.children.cycles-pp.do_ftruncate 0.49 +0.1 0.59 ± 7% perf-profile.children.cycles-pp.__x64_sys_ftruncate 0.59 +0.1 0.70 ± 7% perf-profile.children.cycles-pp.ftruncate64 0.28 +0.1 0.40 ± 6% perf-profile.children.cycles-pp.alloc_file_pseudo 98.62 +0.2 98.77 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 98.58 +0.2 98.74 perf-profile.children.cycles-pp.do_syscall_64 48.30 +0.6 48.86 perf-profile.children.cycles-pp.__close 48.26 +0.6 48.83 perf-profile.children.cycles-pp.__x64_sys_close 48.21 +0.6 48.80 perf-profile.children.cycles-pp.__fput 48.04 +0.6 48.68 perf-profile.children.cycles-pp.dput 48.02 +0.6 48.67 perf-profile.children.cycles-pp.__dentry_kill 47.77 +0.7 48.47 perf-profile.children.cycles-pp.evict 0.30 -0.1 0.23 ± 7% perf-profile.self.cycles-pp._raw_spin_lock 0.10 ± 4% -0.0 0.06 ± 7% perf-profile.self.cycles-pp.__fput 0.08 ± 6% -0.0 0.05 ± 8% perf-profile.self.cycles-pp.inode_init_always 0.06 -0.0 0.04 ± 44% perf-profile.self.cycles-pp.native_irq_return_iret 0.08 -0.0 0.06 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.09 -0.0 0.08 ± 4% perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack 0.07 +0.0 0.09 ± 7% perf-profile.self.cycles-pp.__shmem_get_inode 0.06 ± 7% +0.0 0.09 ± 9% perf-profile.self.cycles-pp.errseq_sample 0.15 ± 2% +0.0 0.18 ± 7% perf-profile.self.cycles-pp.common_perm_cond 0.03 ± 70% +0.0 0.06 ± 7% perf-profile.self.cycles-pp.apparmor_file_alloc_security 0.06 +0.0 0.10 ± 7% perf-profile.self.cycles-pp.inode_init_owner 0.10 +0.1 0.16 ± 6% perf-profile.self.cycles-pp.clear_nlink Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki