Re: [linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/25/24 10:04 AM, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a 5.2% improvement of stress-ng.seal.ops_per_sec on:
> 
> 
> commit: 306c4ac9896b07b8872293eb224058ff83f81fac ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

Well that's great news, but also highly unlikely that the commit would cause
such an improvement, as it only optimizes a once-per-boot operation of
create_kmalloc_caches(). Maybe there are secondary effects in different
order of slab cache creation resulting in some different cpu cache layout,
but such improvement could be machine and compiler specific and overall fragile.

> testcase: stress-ng
> test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
> parameters:
> 
> 	nr_threads: 100%
> 	testtime: 60s
> 	test: seal
> 	cpufreq_governor: performance
> 
> 
> 
> 
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240725/202407251553.12f35198-oliver.sang@xxxxxxxxx
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/seal/stress-ng/60s
> 
> commit: 
>   844776cb65 ("mm/slub: mark racy access on slab->freelist")
>   306c4ac989 ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
> 
> 844776cb65a77ef2 306c4ac9896b07b8872293eb224 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>       2.51 ± 27%      +1.9        4.44 ± 35%  mpstat.cpu.all.idle%
>     975100 ± 19%     +29.5%    1262643 ± 16%  numa-meminfo.node1.AnonPages.max
>     187.06 ±  4%     -11.5%     165.63 ± 10%  sched_debug.cfs_rq:/.runnable_avg.stddev
>       0.05 ± 18%     -40.0%       0.03 ± 58%  vmstat.procs.b
>   58973718            +5.2%   62024061        stress-ng.seal.ops
>     982893            +5.2%    1033732        stress-ng.seal.ops_per_sec
>   59045344            +5.2%   62095668        stress-ng.time.minor_page_faults
>     174957            +1.4%     177400        proc-vmstat.nr_slab_unreclaimable
>   63634761            +5.5%   67148443        proc-vmstat.numa_hit
>   63399995            +5.5%   66914221        proc-vmstat.numa_local
>   73601172            +6.1%   78073549        proc-vmstat.pgalloc_normal
>   59870250            +5.3%   63063514        proc-vmstat.pgfault
>   72718474            +6.0%   77106313        proc-vmstat.pgfree
>  1.983e+10            +1.3%   2.01e+10        perf-stat.i.branch-instructions
>   66023349            +5.6%   69728143        perf-stat.i.cache-misses
>  2.023e+08            +4.7%  2.117e+08        perf-stat.i.cache-references
>       7.22            -1.9%       7.08        perf-stat.i.cpi
>       9738            -5.6%       9196        perf-stat.i.cycles-between-cache-misses
>  8.799e+10            +1.6%  8.939e+10        perf-stat.i.instructions
>       0.14            +1.6%       0.14        perf-stat.i.ipc
>       8.71            +5.1%       9.16        perf-stat.i.metric.K/sec
>     983533            +4.7%    1029816        perf-stat.i.minor-faults
>     983533            +4.7%    1029816        perf-stat.i.page-faults
>       7.30           -18.4%       5.96 ± 44%  perf-stat.overall.cpi
>       9735           -21.3%       7658 ± 44%  perf-stat.overall.cycles-between-cache-misses
>       0.52            +0.1        0.62 ±  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
>       0.56            +0.1        0.67 ±  7%  perf-profile.calltrace.cycles-pp.ftruncate64
>       0.34 ± 70%      +0.3        0.60 ±  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
>      48.29            +0.6       48.86        perf-profile.calltrace.cycles-pp.__close
>      48.27            +0.6       48.84        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
>      48.27            +0.6       48.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
>      48.26            +0.6       48.83        perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
>       0.00            +0.6        0.58 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
>      48.21            +0.6       48.80        perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
>      48.03            +0.6       48.68        perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      48.02            +0.6       48.66        perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.__x64_sys_close.do_syscall_64
>      47.76            +0.7       48.47        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close
>      47.19            +0.7       47.92        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
>      47.11            +0.8       47.88        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
>       0.74            -0.3        0.48 ±  8%  perf-profile.children.cycles-pp.__munmap
>       0.69            -0.2        0.44 ±  9%  perf-profile.children.cycles-pp.__x64_sys_munmap
>       0.68            -0.2        0.44 ±  9%  perf-profile.children.cycles-pp.__vm_munmap
>       0.68            -0.2        0.45 ±  9%  perf-profile.children.cycles-pp.do_vmi_munmap
>       0.65            -0.2        0.42 ±  8%  perf-profile.children.cycles-pp.do_vmi_align_munmap
>       0.44            -0.2        0.28 ±  7%  perf-profile.children.cycles-pp.unmap_region
>       0.48            -0.1        0.36 ±  7%  perf-profile.children.cycles-pp.asm_exc_page_fault
>       0.42            -0.1        0.32 ±  7%  perf-profile.children.cycles-pp.do_user_addr_fault
>       0.42 ±  2%      -0.1        0.32 ±  7%  perf-profile.children.cycles-pp.exc_page_fault
>       0.38 ±  2%      -0.1        0.29 ±  7%  perf-profile.children.cycles-pp.handle_mm_fault
>       0.35 ±  2%      -0.1        0.27 ±  7%  perf-profile.children.cycles-pp.__handle_mm_fault
>       0.33 ±  2%      -0.1        0.26 ±  6%  perf-profile.children.cycles-pp.do_fault
>       0.21 ±  2%      -0.1        0.14 ±  8%  perf-profile.children.cycles-pp.lru_add_drain
>       0.22            -0.1        0.15 ± 11%  perf-profile.children.cycles-pp.alloc_inode
>       0.21 ±  2%      -0.1        0.15 ±  9%  perf-profile.children.cycles-pp.lru_add_drain_cpu
>       0.18 ±  2%      -0.1        0.12 ±  8%  perf-profile.children.cycles-pp.unmap_vmas
>       0.21 ±  2%      -0.1        0.14 ±  7%  perf-profile.children.cycles-pp.folio_batch_move_lru
>       0.17            -0.1        0.11 ±  8%  perf-profile.children.cycles-pp.unmap_page_range
>       0.16 ±  2%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.zap_pte_range
>       0.16 ±  2%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.zap_pmd_range
>       0.26 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.shmem_fault
>       0.50            -0.1        0.45 ±  8%  perf-profile.children.cycles-pp.mmap_region
>       0.26 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.__do_fault
>       0.26            -0.1        0.21 ±  6%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
>       0.19 ±  2%      -0.1        0.14 ± 14%  perf-profile.children.cycles-pp.write
>       0.22 ±  3%      -0.0        0.18 ±  5%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
>       0.11 ±  4%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.mas_store_gfp
>       0.16 ±  2%      -0.0        0.12 ± 11%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.14            -0.0        0.10 ± 10%  perf-profile.children.cycles-pp.mas_wr_node_store
>       0.08            -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.msync
>       0.06            -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.mas_find
>       0.12 ±  4%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.inode_init_always
>       0.10 ±  3%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.shmem_alloc_inode
>       0.16            -0.0        0.13 ±  9%  perf-profile.children.cycles-pp.__x64_sys_fcntl
>       0.11 ±  4%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.shmem_file_write_iter
>       0.10 ±  4%      -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.do_fcntl
>       0.15            -0.0        0.13 ±  8%  perf-profile.children.cycles-pp.destroy_inode
>       0.16 ±  3%      -0.0        0.14 ±  7%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
>       0.22 ±  3%      -0.0        0.20 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
>       0.08            -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.___slab_alloc
>       0.15 ±  3%      -0.0        0.12 ±  8%  perf-profile.children.cycles-pp.__destroy_inode
>       0.07 ±  7%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.__call_rcu_common
>       0.13 ±  2%      -0.0        0.11 ±  8%  perf-profile.children.cycles-pp.perf_event_mmap
>       0.09            -0.0        0.07 ±  9%  perf-profile.children.cycles-pp.memfd_fcntl
>       0.06            -0.0        0.04 ± 44%  perf-profile.children.cycles-pp.native_irq_return_iret
>       0.08 ±  6%      -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.shmem_add_to_page_cache
>       0.12            -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.perf_event_mmap_event
>       0.11 ±  3%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
>       0.10            -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.uncharge_batch
>       0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64
>       0.05            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.__d_alloc
>       0.05            +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.d_alloc_pseudo
>       0.07            +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.file_init_path
>       0.06 ±  6%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.security_file_alloc
>       0.07 ±  7%      +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.errseq_sample
>       0.04 ± 44%      +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.apparmor_file_alloc_security
>       0.09            +0.0        0.12 ±  5%  perf-profile.children.cycles-pp.init_file
>       0.15            +0.0        0.18 ±  7%  perf-profile.children.cycles-pp.common_perm_cond
>       0.15 ±  3%      +0.0        0.19 ±  8%  perf-profile.children.cycles-pp.security_file_truncate
>       0.20            +0.0        0.24 ±  7%  perf-profile.children.cycles-pp.notify_change
>       0.06            +0.0        0.10 ±  6%  perf-profile.children.cycles-pp.inode_init_owner
>       0.13            +0.0        0.18 ±  5%  perf-profile.children.cycles-pp.alloc_empty_file
>       0.10            +0.1        0.16 ±  7%  perf-profile.children.cycles-pp.clear_nlink
>       0.47            +0.1        0.56 ±  7%  perf-profile.children.cycles-pp.do_ftruncate
>       0.49            +0.1        0.59 ±  7%  perf-profile.children.cycles-pp.__x64_sys_ftruncate
>       0.59            +0.1        0.70 ±  7%  perf-profile.children.cycles-pp.ftruncate64
>       0.28            +0.1        0.40 ±  6%  perf-profile.children.cycles-pp.alloc_file_pseudo
>      98.62            +0.2       98.77        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
>      98.58            +0.2       98.74        perf-profile.children.cycles-pp.do_syscall_64
>      48.30            +0.6       48.86        perf-profile.children.cycles-pp.__close
>      48.26            +0.6       48.83        perf-profile.children.cycles-pp.__x64_sys_close
>      48.21            +0.6       48.80        perf-profile.children.cycles-pp.__fput
>      48.04            +0.6       48.68        perf-profile.children.cycles-pp.dput
>      48.02            +0.6       48.67        perf-profile.children.cycles-pp.__dentry_kill
>      47.77            +0.7       48.47        perf-profile.children.cycles-pp.evict
>       0.30            -0.1        0.23 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock
>       0.10 ±  4%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__fput
>       0.08 ±  6%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.inode_init_always
>       0.06            -0.0        0.04 ± 44%  perf-profile.self.cycles-pp.native_irq_return_iret
>       0.08            -0.0        0.06 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
>       0.09            -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.07            +0.0        0.09 ±  7%  perf-profile.self.cycles-pp.__shmem_get_inode
>       0.06 ±  7%      +0.0        0.09 ±  9%  perf-profile.self.cycles-pp.errseq_sample
>       0.15 ±  2%      +0.0        0.18 ±  7%  perf-profile.self.cycles-pp.common_perm_cond
>       0.03 ± 70%      +0.0        0.06 ±  7%  perf-profile.self.cycles-pp.apparmor_file_alloc_security
>       0.06            +0.0        0.10 ±  7%  perf-profile.self.cycles-pp.inode_init_owner
>       0.10            +0.1        0.16 ±  6%  perf-profile.self.cycles-pp.clear_nlink
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux