[linus:master] [mm/slub] 306c4ac989: stress-ng.seal.ops_per_sec 5.2% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

kernel test robot noticed a 5.2% improvement of stress-ng.seal.ops_per_sec on:


commit: 306c4ac9896b07b8872293eb224058ff83f81fac ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master

testcase: stress-ng
test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 256G memory
parameters:

	nr_threads: 100%
	testtime: 60s
	test: seal
	cpufreq_governor: performance






Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240725/202407251553.12f35198-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
  gcc-13/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-spr-r02/seal/stress-ng/60s

commit: 
  844776cb65 ("mm/slub: mark racy access on slab->freelist")
  306c4ac989 ("mm/slub: create kmalloc 96 and 192 caches regardless cache size order")

844776cb65a77ef2 306c4ac9896b07b8872293eb224 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      2.51 ± 27%      +1.9        4.44 ± 35%  mpstat.cpu.all.idle%
    975100 ± 19%     +29.5%    1262643 ± 16%  numa-meminfo.node1.AnonPages.max
    187.06 ±  4%     -11.5%     165.63 ± 10%  sched_debug.cfs_rq:/.runnable_avg.stddev
      0.05 ± 18%     -40.0%       0.03 ± 58%  vmstat.procs.b
  58973718            +5.2%   62024061        stress-ng.seal.ops
    982893            +5.2%    1033732        stress-ng.seal.ops_per_sec
  59045344            +5.2%   62095668        stress-ng.time.minor_page_faults
    174957            +1.4%     177400        proc-vmstat.nr_slab_unreclaimable
  63634761            +5.5%   67148443        proc-vmstat.numa_hit
  63399995            +5.5%   66914221        proc-vmstat.numa_local
  73601172            +6.1%   78073549        proc-vmstat.pgalloc_normal
  59870250            +5.3%   63063514        proc-vmstat.pgfault
  72718474            +6.0%   77106313        proc-vmstat.pgfree
 1.983e+10            +1.3%   2.01e+10        perf-stat.i.branch-instructions
  66023349            +5.6%   69728143        perf-stat.i.cache-misses
 2.023e+08            +4.7%  2.117e+08        perf-stat.i.cache-references
      7.22            -1.9%       7.08        perf-stat.i.cpi
      9738            -5.6%       9196        perf-stat.i.cycles-between-cache-misses
 8.799e+10            +1.6%  8.939e+10        perf-stat.i.instructions
      0.14            +1.6%       0.14        perf-stat.i.ipc
      8.71            +5.1%       9.16        perf-stat.i.metric.K/sec
    983533            +4.7%    1029816        perf-stat.i.minor-faults
    983533            +4.7%    1029816        perf-stat.i.page-faults
      7.30           -18.4%       5.96 ± 44%  perf-stat.overall.cpi
      9735           -21.3%       7658 ± 44%  perf-stat.overall.cycles-between-cache-misses
      0.52            +0.1        0.62 ±  7%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.ftruncate64
      0.56            +0.1        0.67 ±  7%  perf-profile.calltrace.cycles-pp.ftruncate64
      0.34 ± 70%      +0.3        0.60 ±  7%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     48.29            +0.6       48.86        perf-profile.calltrace.cycles-pp.__close
     48.27            +0.6       48.84        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
     48.27            +0.6       48.84        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close
     48.26            +0.6       48.83        perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
      0.00            +0.6        0.58 ±  7%  perf-profile.calltrace.cycles-pp.__x64_sys_ftruncate.do_syscall_64.entry_SYSCALL_64_after_hwframe.ftruncate64
     48.21            +0.6       48.80        perf-profile.calltrace.cycles-pp.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close
     48.03            +0.6       48.68        perf-profile.calltrace.cycles-pp.dput.__fput.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe
     48.02            +0.6       48.66        perf-profile.calltrace.cycles-pp.__dentry_kill.dput.__fput.__x64_sys_close.do_syscall_64
     47.76            +0.7       48.47        perf-profile.calltrace.cycles-pp.evict.__dentry_kill.dput.__fput.__x64_sys_close
     47.19            +0.7       47.92        perf-profile.calltrace.cycles-pp._raw_spin_lock.evict.__dentry_kill.dput.__fput
     47.11            +0.8       47.88        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.evict.__dentry_kill.dput
      0.74            -0.3        0.48 ±  8%  perf-profile.children.cycles-pp.__munmap
      0.69            -0.2        0.44 ±  9%  perf-profile.children.cycles-pp.__x64_sys_munmap
      0.68            -0.2        0.44 ±  9%  perf-profile.children.cycles-pp.__vm_munmap
      0.68            -0.2        0.45 ±  9%  perf-profile.children.cycles-pp.do_vmi_munmap
      0.65            -0.2        0.42 ±  8%  perf-profile.children.cycles-pp.do_vmi_align_munmap
      0.44            -0.2        0.28 ±  7%  perf-profile.children.cycles-pp.unmap_region
      0.48            -0.1        0.36 ±  7%  perf-profile.children.cycles-pp.asm_exc_page_fault
      0.42            -0.1        0.32 ±  7%  perf-profile.children.cycles-pp.do_user_addr_fault
      0.42 ±  2%      -0.1        0.32 ±  7%  perf-profile.children.cycles-pp.exc_page_fault
      0.38 ±  2%      -0.1        0.29 ±  7%  perf-profile.children.cycles-pp.handle_mm_fault
      0.35 ±  2%      -0.1        0.27 ±  7%  perf-profile.children.cycles-pp.__handle_mm_fault
      0.33 ±  2%      -0.1        0.26 ±  6%  perf-profile.children.cycles-pp.do_fault
      0.21 ±  2%      -0.1        0.14 ±  8%  perf-profile.children.cycles-pp.lru_add_drain
      0.22            -0.1        0.15 ± 11%  perf-profile.children.cycles-pp.alloc_inode
      0.21 ±  2%      -0.1        0.15 ±  9%  perf-profile.children.cycles-pp.lru_add_drain_cpu
      0.18 ±  2%      -0.1        0.12 ±  8%  perf-profile.children.cycles-pp.unmap_vmas
      0.21 ±  2%      -0.1        0.14 ±  7%  perf-profile.children.cycles-pp.folio_batch_move_lru
      0.17            -0.1        0.11 ±  8%  perf-profile.children.cycles-pp.unmap_page_range
      0.16 ±  2%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.zap_pte_range
      0.16 ±  2%      -0.1        0.10 ±  9%  perf-profile.children.cycles-pp.zap_pmd_range
      0.26 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.shmem_fault
      0.50            -0.1        0.45 ±  8%  perf-profile.children.cycles-pp.mmap_region
      0.26 ±  2%      -0.1        0.20 ±  7%  perf-profile.children.cycles-pp.__do_fault
      0.26            -0.1        0.21 ±  6%  perf-profile.children.cycles-pp.shmem_get_folio_gfp
      0.19 ±  2%      -0.1        0.14 ± 14%  perf-profile.children.cycles-pp.write
      0.22 ±  3%      -0.0        0.18 ±  5%  perf-profile.children.cycles-pp.shmem_alloc_and_add_folio
      0.11 ±  4%      -0.0        0.07 ± 10%  perf-profile.children.cycles-pp.mas_store_gfp
      0.16 ±  2%      -0.0        0.12 ± 11%  perf-profile.children.cycles-pp.mas_wr_store_entry
      0.14            -0.0        0.10 ± 10%  perf-profile.children.cycles-pp.mas_wr_node_store
      0.08            -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.msync
      0.06            -0.0        0.02 ± 99%  perf-profile.children.cycles-pp.mas_find
      0.12 ±  4%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.inode_init_always
      0.10 ±  3%      -0.0        0.07 ± 11%  perf-profile.children.cycles-pp.shmem_alloc_inode
      0.16            -0.0        0.13 ±  9%  perf-profile.children.cycles-pp.__x64_sys_fcntl
      0.11 ±  4%      -0.0        0.08 ± 11%  perf-profile.children.cycles-pp.shmem_file_write_iter
      0.10 ±  4%      -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.do_fcntl
      0.15            -0.0        0.13 ±  8%  perf-profile.children.cycles-pp.destroy_inode
      0.16 ±  3%      -0.0        0.14 ±  7%  perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave
      0.22 ±  3%      -0.0        0.20 ±  5%  perf-profile.children.cycles-pp._raw_spin_lock_irqsave
      0.08            -0.0        0.06 ± 11%  perf-profile.children.cycles-pp.___slab_alloc
      0.15 ±  3%      -0.0        0.12 ±  8%  perf-profile.children.cycles-pp.__destroy_inode
      0.07 ±  7%      -0.0        0.04 ± 45%  perf-profile.children.cycles-pp.__call_rcu_common
      0.13 ±  2%      -0.0        0.11 ±  8%  perf-profile.children.cycles-pp.perf_event_mmap
      0.09            -0.0        0.07 ±  9%  perf-profile.children.cycles-pp.memfd_fcntl
      0.06            -0.0        0.04 ± 44%  perf-profile.children.cycles-pp.native_irq_return_iret
      0.08 ±  6%      -0.0        0.06 ±  8%  perf-profile.children.cycles-pp.shmem_add_to_page_cache
      0.12            -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.perf_event_mmap_event
      0.11 ±  3%      -0.0        0.09 ±  7%  perf-profile.children.cycles-pp.__lruvec_stat_mod_folio
      0.10            -0.0        0.08 ±  8%  perf-profile.children.cycles-pp.uncharge_batch
      0.12 ±  4%      -0.0        0.10 ±  6%  perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.05            +0.0        0.07 ±  5%  perf-profile.children.cycles-pp.__d_alloc
      0.05            +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.d_alloc_pseudo
      0.07            +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.file_init_path
      0.06 ±  6%      +0.0        0.08 ±  8%  perf-profile.children.cycles-pp.security_file_alloc
      0.07 ±  7%      +0.0        0.09 ±  7%  perf-profile.children.cycles-pp.errseq_sample
      0.04 ± 44%      +0.0        0.07 ± 10%  perf-profile.children.cycles-pp.apparmor_file_alloc_security
      0.09            +0.0        0.12 ±  5%  perf-profile.children.cycles-pp.init_file
      0.15            +0.0        0.18 ±  7%  perf-profile.children.cycles-pp.common_perm_cond
      0.15 ±  3%      +0.0        0.19 ±  8%  perf-profile.children.cycles-pp.security_file_truncate
      0.20            +0.0        0.24 ±  7%  perf-profile.children.cycles-pp.notify_change
      0.06            +0.0        0.10 ±  6%  perf-profile.children.cycles-pp.inode_init_owner
      0.13            +0.0        0.18 ±  5%  perf-profile.children.cycles-pp.alloc_empty_file
      0.10            +0.1        0.16 ±  7%  perf-profile.children.cycles-pp.clear_nlink
      0.47            +0.1        0.56 ±  7%  perf-profile.children.cycles-pp.do_ftruncate
      0.49            +0.1        0.59 ±  7%  perf-profile.children.cycles-pp.__x64_sys_ftruncate
      0.59            +0.1        0.70 ±  7%  perf-profile.children.cycles-pp.ftruncate64
      0.28            +0.1        0.40 ±  6%  perf-profile.children.cycles-pp.alloc_file_pseudo
     98.62            +0.2       98.77        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     98.58            +0.2       98.74        perf-profile.children.cycles-pp.do_syscall_64
     48.30            +0.6       48.86        perf-profile.children.cycles-pp.__close
     48.26            +0.6       48.83        perf-profile.children.cycles-pp.__x64_sys_close
     48.21            +0.6       48.80        perf-profile.children.cycles-pp.__fput
     48.04            +0.6       48.68        perf-profile.children.cycles-pp.dput
     48.02            +0.6       48.67        perf-profile.children.cycles-pp.__dentry_kill
     47.77            +0.7       48.47        perf-profile.children.cycles-pp.evict
      0.30            -0.1        0.23 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock
      0.10 ±  4%      -0.0        0.06 ±  7%  perf-profile.self.cycles-pp.__fput
      0.08 ±  6%      -0.0        0.05 ±  8%  perf-profile.self.cycles-pp.inode_init_always
      0.06            -0.0        0.04 ± 44%  perf-profile.self.cycles-pp.native_irq_return_iret
      0.08            -0.0        0.06 ±  7%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.09            -0.0        0.08 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.07            +0.0        0.09 ±  7%  perf-profile.self.cycles-pp.__shmem_get_inode
      0.06 ±  7%      +0.0        0.09 ±  9%  perf-profile.self.cycles-pp.errseq_sample
      0.15 ±  2%      +0.0        0.18 ±  7%  perf-profile.self.cycles-pp.common_perm_cond
      0.03 ± 70%      +0.0        0.06 ±  7%  perf-profile.self.cycles-pp.apparmor_file_alloc_security
      0.06            +0.0        0.10 ±  7%  perf-profile.self.cycles-pp.inode_init_owner
      0.10            +0.1        0.16 ±  6%  perf-profile.self.cycles-pp.clear_nlink




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux