[linux-next:master] 253ca8678d: lmbench3.Select.100tcp.latency.us -5.0% improvement

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




Hello,

this commit fixes the
"[linus:master] [file]  0ede61d858:  will-it-scale.per_thread_ops -2.9% regression"
we reported in
https://lore.kernel.org/oe-lkp/202311201406.2022ca3f-oliver.sang@xxxxxxxxx/

in our tests, besides the improvment in will-it-scale tests, we also noticed
the improvement in lmbench3 latency tests. so just report as below FYI.



kernel test robot noticed a -5.0% improvement of lmbench3.Select.100tcp.latency.us on:


commit: 253ca8678d30bcf94410b54476fc1e0f1627a137 ("Improve __fget_files_rcu() code generation (and thus __fget_light())")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: lmbench3
test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 112G memory
parameters:

	test_memory_size: 50%
	nr_threads: 50%
	mode: development
	test: SELECT
	cpufreq_governor: performance


In addition to that, the commit also has significant impact on the following tests:

+------------------+----------------------------------------------------------------------------------------------------+
| testcase: change | will-it-scale: will-it-scale.per_process_ops 10.3% improvement                                     |
| test machine     | 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory |
| test parameters  | cpufreq_governor=performance                                                                       |
|                  | mode=process                                                                                       |
|                  | nr_task=100%                                                                                       |
|                  | test=poll2                                                                                         |
+------------------+----------------------------------------------------------------------------------------------------+




Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20231222/202312221056.da0e7f9-oliver.sang@xxxxxxxxx

=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_threads/rootfs/tbox_group/test/test_memory_size/testcase:
  gcc-12/performance/x86_64-rhel-8.3/development/50%/debian-11.1-x86_64-20220510.cgz/lkp-ivb-2ep1/SELECT/50%/lmbench3

commit: 
  7cb537b6f6 ("file: massage cleanup of files that failed to open")
  253ca8678d ("Improve __fget_files_rcu() code generation (and thus __fget_light())")

7cb537b6f6d7d652 253ca8678d30bcf94410b54476f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
      1.78            -9.8%       1.61        lmbench3.Select.100fd.latency.us
      5.70            -5.0%       5.41        lmbench3.Select.100tcp.latency.us
     12.09 ± 36%     -12.1        0.00        perf-profile.calltrace.cycles-pp.__fget_light.do_select.core_sys_select.kern_select.__x64_sys_select
      0.05 ±299%     +14.9       14.97 ± 51%  perf-profile.calltrace.cycles-pp.__fdget.do_select.core_sys_select.kern_select.__x64_sys_select
     12.09 ± 36%     -12.1        0.00        perf-profile.children.cycles-pp.__fget_light
      0.36 ± 42%     +14.6       14.98 ± 51%  perf-profile.children.cycles-pp.__fdget
     12.05 ± 36%     -12.1        0.00        perf-profile.self.cycles-pp.__fget_light
      0.31 ± 42%     +14.6       14.91 ± 52%  perf-profile.self.cycles-pp.__fdget
      0.19 ±  2%      +0.0        0.20 ±  3%  perf-stat.i.dTLB-store-miss-rate%
   1585715 ±  8%     +93.4%    3067285 ± 30%  perf-stat.i.iTLB-load-misses
      0.17 ±  2%      +0.0        0.19 ±  3%  perf-stat.overall.dTLB-store-miss-rate%
     88.15 ±  5%      +4.9       93.07        perf-stat.overall.iTLB-load-miss-rate%
     48830 ±  8%     -45.0%      26871 ± 25%  perf-stat.overall.instructions-per-iTLB-miss
      1.41            -1.8%       1.38        perf-stat.overall.ipc
   1573086 ±  8%     +93.7%    3047643 ± 30%  perf-stat.ps.iTLB-load-misses


***************************************************************************************************
lkp-cpl-4sp2: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory
=========================================================================================
compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase:
  gcc-12/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale

commit: 
  7cb537b6f6 ("file: massage cleanup of files that failed to open")
  253ca8678d ("Improve __fget_files_rcu() code generation (and thus __fget_light())")

7cb537b6f6d7d652 253ca8678d30bcf94410b54476f 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
    685.00 ±  5%     +62.3%       1111 ± 13%  perf-c2c.HITM.local
      0.04 ±187%    +482.9%       0.21 ± 50%  perf-sched.sch_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
    136406            +2.0%     139095        proc-vmstat.nr_active_anon
    136406            +2.0%     139095        proc-vmstat.nr_zone_active_anon
  98393191           +10.3%  1.085e+08        will-it-scale.224.processes
    439254           +10.3%     484377        will-it-scale.per_process_ops
  98393191           +10.3%  1.085e+08        will-it-scale.workload
      0.00           +28.2%       0.00 ± 17%  perf-stat.i.MPKI
 2.226e+11            -2.2%  2.178e+11        perf-stat.i.branch-instructions
      0.28            +0.0        0.30        perf-stat.i.branch-miss-rate%
 6.155e+08            +7.4%  6.608e+08        perf-stat.i.branch-misses
     12.91            -3.3        9.62 ± 13%  perf-stat.i.cache-miss-rate%
   1955843           +22.9%    2402856 ± 17%  perf-stat.i.cache-misses
  15946481           +59.2%   25391906 ±  9%  perf-stat.i.cache-references
      0.59            +5.0%       0.62        perf-stat.i.cpi
    408471           -17.9%     335390 ± 14%  perf-stat.i.cycles-between-cache-misses
 2.901e+11            -4.0%  2.784e+11        perf-stat.i.dTLB-loads
      0.00 ±  9%      +0.0        0.00 ± 10%  perf-stat.i.dTLB-store-miss-rate%
 1.814e+11           -12.6%  1.585e+11        perf-stat.i.dTLB-stores
  26765498            +9.7%   29360826        perf-stat.i.iTLB-load-misses
  1.23e+12            -4.4%  1.176e+12        perf-stat.i.instructions
     46105           -12.9%      40163        perf-stat.i.instructions-per-iTLB-miss
      1.69            -4.8%       1.61        perf-stat.i.ipc
      1.30            -4.1%       1.24        perf-stat.i.metric.G/sec
     75.67           +56.5%     118.40 ±  9%  perf-stat.i.metric.K/sec
      1802            -6.9%       1679        perf-stat.i.metric.M/sec
     91.19            +1.9       93.14        perf-stat.i.node-load-miss-rate%
    603847           +29.4%     781631 ± 13%  perf-stat.i.node-load-misses
      0.00 ± 44%     +54.2%       0.00 ± 17%  perf-stat.overall.MPKI
      0.23 ± 44%      +0.1        0.30        perf-stat.overall.branch-miss-rate%
      0.49 ± 44%     +26.0%       0.62        perf-stat.overall.cpi
      0.00 ± 46%      +0.0        0.00 ± 10%  perf-stat.overall.dTLB-store-miss-rate%
     73.34 ± 44%     +18.0       91.29        perf-stat.overall.node-load-miss-rate%
 5.111e+08 ± 44%     +28.9%  6.586e+08        perf-stat.ps.branch-misses
   1626781 ± 44%     +47.4%    2397620 ± 17%  perf-stat.ps.cache-misses
  13269755 ± 44%     +91.5%   25415998 ±  9%  perf-stat.ps.cache-references
  22231799 ± 44%     +31.6%   29255242        perf-stat.ps.iTLB-load-misses
    501267 ± 44%     +55.4%     779219 ± 13%  perf-stat.ps.node-load-misses
     16030 ± 45%     +33.6%      21409 ±  6%  perf-stat.ps.node-stores
     47.56           -47.6        0.00        perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
     67.41            -2.9       64.56        perf-profile.calltrace.cycles-pp.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
     87.35            -1.2       86.15        perf-profile.calltrace.cycles-pp.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     87.96            -1.1       86.82        perf-profile.calltrace.cycles-pp.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     88.69            -1.1       87.62        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__poll
     89.02            -1.1       87.97        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__poll
     91.89            -0.8       91.12        perf-profile.calltrace.cycles-pp.__poll
      0.81            +0.0        0.85        perf-profile.calltrace.cycles-pp.__check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
      0.64            +0.1        0.69 ±  2%  perf-profile.calltrace.cycles-pp.__kmem_cache_free.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      0.68            +0.1        0.74        perf-profile.calltrace.cycles-pp.kfree.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.26            +0.1        1.32        perf-profile.calltrace.cycles-pp.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64
      0.84            +0.1        0.94 ±  2%  perf-profile.calltrace.cycles-pp.__virt_addr_valid.check_heap_object.__check_object_size.do_sys_poll.__x64_sys_poll
      1.53            +0.1        1.67        perf-profile.calltrace.cycles-pp.__kmem_cache_alloc_node.__kmalloc.do_sys_poll.__x64_sys_poll.do_syscall_64
      1.82            +0.2        1.98        perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.__poll
      2.60            +0.2        2.76        perf-profile.calltrace.cycles-pp.__check_object_size.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1.91            +0.2        2.09        perf-profile.calltrace.cycles-pp.__kmalloc.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      2.44 ±  2%      +0.2        2.62        perf-profile.calltrace.cycles-pp.rep_movs_alternative._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64
      3.86            +0.3        4.20        perf-profile.calltrace.cycles-pp._copy_from_user.do_sys_poll.__x64_sys_poll.do_syscall_64.entry_SYSCALL_64_after_hwframe
      7.94            +0.8        8.70        perf-profile.calltrace.cycles-pp.testcase
      3.60           +42.4       45.95        perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64
     45.80           -45.8        0.00        perf-profile.children.cycles-pp.__fget_light
     69.22            -2.7       66.50        perf-profile.children.cycles-pp.do_poll
     87.48            -1.2       86.29        perf-profile.children.cycles-pp.do_sys_poll
     87.99            -1.1       86.85        perf-profile.children.cycles-pp.__x64_sys_poll
     88.74            -1.1       87.67        perf-profile.children.cycles-pp.do_syscall_64
     89.06            -1.0       88.01        perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
     91.99            -0.8       91.23        perf-profile.children.cycles-pp.__poll
      0.08            +0.0        0.09 ±  4%  perf-profile.children.cycles-pp.is_vmalloc_addr
      0.14 ±  2%      +0.0        0.16 ±  3%  perf-profile.children.cycles-pp.exit_to_user_mode_prepare
      0.24            +0.0        0.26        perf-profile.children.cycles-pp.memcg_slab_post_alloc_hook
      0.16 ±  3%      +0.0        0.17        perf-profile.children.cycles-pp.rcu_all_qs
      0.13 ±  3%      +0.0        0.14 ±  2%  perf-profile.children.cycles-pp.kmalloc_slab
      0.12 ±  3%      +0.0        0.14 ±  3%  perf-profile.children.cycles-pp.syscall_enter_from_user_mode
      0.21 ±  2%      +0.0        0.24        perf-profile.children.cycles-pp.check_stack_object
      0.24 ±  2%      +0.0        0.27        perf-profile.children.cycles-pp.poll@plt
      0.15 ±  2%      +0.0        0.18 ±  2%  perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack
      0.24 ±  2%      +0.0        0.26        perf-profile.children.cycles-pp.__cond_resched
      0.36            +0.0        0.40 ±  2%  perf-profile.children.cycles-pp.syscall_exit_to_user_mode
      0.81            +0.0        0.86        perf-profile.children.cycles-pp.__check_heap_object
      0.48            +0.0        0.53        perf-profile.children.cycles-pp.syscall_return_via_sysret
      0.65            +0.1        0.70        perf-profile.children.cycles-pp.__kmem_cache_free
      0.68            +0.1        0.74        perf-profile.children.cycles-pp.kfree
      0.70            +0.1        0.76        perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
      1.32            +0.1        1.39        perf-profile.children.cycles-pp.check_heap_object
      1.14            +0.1        1.23        perf-profile.children.cycles-pp.entry_SYSCALL_64
      0.85            +0.1        0.96        perf-profile.children.cycles-pp.__virt_addr_valid
      1.60            +0.1        1.76        perf-profile.children.cycles-pp.__kmem_cache_alloc_node
      2.76            +0.2        2.94        perf-profile.children.cycles-pp.__check_object_size
      1.94            +0.2        2.13        perf-profile.children.cycles-pp.__kmalloc
      2.48 ±  2%      +0.2        2.67        perf-profile.children.cycles-pp.rep_movs_alternative
      4.09            +0.4        4.45        perf-profile.children.cycles-pp._copy_from_user
      8.04            +0.8        8.81        perf-profile.children.cycles-pp.testcase
      3.58           +40.5       44.04        perf-profile.children.cycles-pp.__fdget
     43.81           -43.8        0.00        perf-profile.self.cycles-pp.__fget_light
      0.40            -0.0        0.38        perf-profile.self.cycles-pp.check_heap_object
      0.15            +0.0        0.16        perf-profile.self.cycles-pp.poll_select_set_timeout
      0.06            +0.0        0.07        perf-profile.self.cycles-pp.is_vmalloc_addr
      0.10 ±  4%      +0.0        0.12 ±  4%  perf-profile.self.cycles-pp.exit_to_user_mode_prepare
      0.14 ±  2%      +0.0        0.15 ±  2%  perf-profile.self.cycles-pp.rcu_all_qs
      0.11 ±  4%      +0.0        0.13 ±  2%  perf-profile.self.cycles-pp.kmalloc_slab
      0.11            +0.0        0.12 ±  4%  perf-profile.self.cycles-pp.syscall_enter_from_user_mode
      0.21            +0.0        0.23 ±  2%  perf-profile.self.cycles-pp.memcg_slab_post_alloc_hook
      0.14 ±  3%      +0.0        0.16        perf-profile.self.cycles-pp.poll@plt
      0.18 ±  2%      +0.0        0.20        perf-profile.self.cycles-pp.check_stack_object
      0.15 ±  2%      +0.0        0.17 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack
      0.22 ±  2%      +0.0        0.24 ±  2%  perf-profile.self.cycles-pp.__kmalloc
      0.32 ±  2%      +0.0        0.34 ±  2%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.25            +0.0        0.28        perf-profile.self.cycles-pp.do_syscall_64
      0.43            +0.0        0.47        perf-profile.self.cycles-pp.__check_object_size
      0.45            +0.0        0.48        perf-profile.self.cycles-pp.entry_SYSCALL_64
      0.36            +0.0        0.40 ±  2%  perf-profile.self.cycles-pp.__x64_sys_poll
      0.81            +0.0        0.85        perf-profile.self.cycles-pp.__check_heap_object
      0.48            +0.0        0.52        perf-profile.self.cycles-pp.syscall_return_via_sysret
      0.65            +0.1        0.70        perf-profile.self.cycles-pp.__kmem_cache_free
      0.68            +0.1        0.74        perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
      0.66            +0.1        0.72        perf-profile.self.cycles-pp.kfree
      0.81            +0.1        0.91 ±  2%  perf-profile.self.cycles-pp.__virt_addr_valid
      1.05 ±  4%      +0.1        1.16 ±  3%  perf-profile.self.cycles-pp.__poll
      1.13            +0.1        1.24        perf-profile.self.cycles-pp.__kmem_cache_alloc_node
      1.73            +0.2        1.90        perf-profile.self.cycles-pp._copy_from_user
      2.33 ±  2%      +0.2        2.52        perf-profile.self.cycles-pp.rep_movs_alternative
      8.10            +0.7        8.80        perf-profile.self.cycles-pp.do_sys_poll
      7.94            +0.8        8.69        perf-profile.self.cycles-pp.testcase
     23.27            +1.0       24.26        perf-profile.self.cycles-pp.do_poll
      1.79           +40.1       41.93        perf-profile.self.cycles-pp.__fdget





Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux