Hello, kernel test robot noticed a 6.3% improvement of will-it-scale.per_thread_ops on: commit: b8decf0015a8b1ff02cdac61c0aa54355d8e73d7 ("[PATCH v5 3/3] fs/file.c: add fast path in find_next_fd()") url: https://github.com/intel-lab-lkp/linux/commits/Yu-Ma/fs-file-c-remove-sanity_check-and-add-likely-unlikely-in-alloc_fd/20240717-224830 base: https://git.kernel.org/cgit/linux/kernel/git/vfs/vfs.git vfs.all patch link: https://lore.kernel.org/all/20240717145018.3972922-4-yu.ma@xxxxxxxxx/ patch subject: [PATCH v5 3/3] fs/file.c: add fast path in find_next_fd() testcase: will-it-scale test machine: 224 threads 2 sockets Intel(R) Xeon(R) Platinum 8480CTDX (Sapphire Rapids) with 512G memory parameters: nr_task: 100% mode: thread test: open3 cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240806/202408062152.7e5b5d6d-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-13/performance/x86_64-rhel-8.3/thread/100%/debian-12-x86_64-20240206.cgz/lkp-spr-2sp4/open3/will-it-scale commit: 5bb3423bf9 ("fs/file.c: conditionally clear full_fds") b8decf0015 ("fs/file.c: add fast path in find_next_fd()") 5bb3423bf9f9d91e b8decf0015a8b1ff02cdac61c0a ---------------- --------------------------- %stddev %change %stddev \ | \ 848151 +6.2% 901119 ± 2% will-it-scale.224.threads 3785 +6.3% 4022 ± 2% will-it-scale.per_thread_ops 848151 +6.2% 901119 ± 2% will-it-scale.workload 0.28 ± 4% +13.3% 0.32 ± 3% perf-stat.i.MPKI 31.31 ± 3% +2.0 33.28 perf-stat.i.cache-miss-rate% 14955855 ± 4% +13.6% 16995785 ± 4% perf-stat.i.cache-misses 49676581 +6.7% 53009444 ± 3% perf-stat.i.cache-references 43955 ± 4% -12.3% 38549 ± 4% perf-stat.i.cycles-between-cache-misses 0.28 ± 4% +13.4% 0.32 ± 4% perf-stat.overall.MPKI 29.84 ± 3% +1.9 31.78 ± 2% perf-stat.overall.cache-miss-rate% 43445 ± 4% -12.1% 38200 ± 4% perf-stat.overall.cycles-between-cache-misses 19005976 -5.4% 17972604 ± 2% perf-stat.overall.path-length 14869677 ± 4% +13.6% 16898438 ± 4% perf-stat.ps.cache-misses 49821402 +6.7% 53168235 ± 3% perf-stat.ps.cache-references 49.42 -0.1 49.34 perf-profile.calltrace.cycles-pp.alloc_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 49.40 -0.1 49.32 perf-profile.calltrace.cycles-pp.file_close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 49.25 -0.1 49.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.file_close_fd.__x64_sys_close.do_syscall_64 49.20 -0.1 49.13 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.alloc_fd.do_sys_openat2.__x64_sys_openat 49.33 -0.1 49.26 perf-profile.calltrace.cycles-pp._raw_spin_lock.file_close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe 49.28 -0.1 49.22 perf-profile.calltrace.cycles-pp._raw_spin_lock.alloc_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64 50.14 +0.0 50.18 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.open64 50.17 +0.0 50.21 perf-profile.calltrace.cycles-pp.open64 0.64 ± 5% +0.1 0.75 ± 6% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.62 ± 5% +0.1 0.74 ± 6% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.__x64_sys_openat.do_syscall_64 49.42 -0.1 49.34 perf-profile.children.cycles-pp.alloc_fd 49.40 -0.1 49.32 perf-profile.children.cycles-pp.file_close_fd 0.06 -0.0 0.05 perf-profile.children.cycles-pp.file_close_fd_locked 0.15 ± 5% +0.0 0.17 ± 4% perf-profile.children.cycles-pp.init_file 0.22 ± 3% +0.0 0.25 ± 3% perf-profile.children.cycles-pp.alloc_empty_file 0.18 ± 6% +0.0 0.22 ± 6% perf-profile.children.cycles-pp.__fput 50.14 +0.0 50.18 perf-profile.children.cycles-pp.__x64_sys_openat 50.18 +0.0 50.22 perf-profile.children.cycles-pp.open64 0.18 ± 14% +0.0 0.23 ± 7% perf-profile.children.cycles-pp.do_dentry_open 0.30 ± 8% +0.1 0.36 ± 8% perf-profile.children.cycles-pp.do_open 0.64 ± 5% +0.1 0.75 ± 6% perf-profile.children.cycles-pp.do_filp_open 0.63 ± 5% +0.1 0.75 ± 6% perf-profile.children.cycles-pp.path_openat 0.06 -0.0 0.05 perf-profile.self.cycles-pp.file_close_fd_locked 0.16 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles-pp._raw_spin_lock 0.08 ± 12% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.__fput 0.05 ± 7% +0.1 0.10 ± 4% perf-profile.self.cycles-pp.alloc_fd Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki