hi, Linus, On Sun, Nov 26, 2023 at 03:20:58PM -0800, Linus Torvalds wrote: > On Sun, 26 Nov 2023 at 12:23, Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: > > > > IOW, I might have messed up some "trivial cleanup" when prepping for > > sending it out... > > Bah. Famous last words. One of the "trivial cleanups" made the code > more "obvious" by renaming the nospec mask as just "mask". > > And that trivial rename broke that patch *entirely*, because now that > name shadowed the "fmode_t" mask argument. > > Don't even ask how long it took me to go from "I *tested* this, > dammit, now it doesn't work at all" to "Oh God, I'm so stupid". > > So that nobody else would waste any time on this, attached is a new > attempt. This time actually tested *after* the changes. we applied the new patch upon 0ede61d858, and confirmed regression is gone, even 3.4% better than 93faf426e3 now. Tested-by: kernel test robot <oliver.sang@xxxxxxxxx> ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/16/debian-11.1-x86_64-20220510.cgz/lkp-cpl-4sp2/poll2/will-it-scale commit: 93faf426e3 ("vfs: shave work on failed file open") 0ede61d858 ("file: convert to SLAB_TYPESAFE_BY_RCU") c712b4365b ("Improve __fget_files_rcu() code generation (and thus __fget_light())") 93faf426e3cc000c 0ede61d8589cc2d93aa78230d74 c712b4365b5b4dbe1d1380edd37 ---------------- --------------------------- --------------------------- %stddev %change %stddev %change %stddev \ | \ | \ 228481 ± 4% -4.6% 217900 ± 6% -11.7% 201857 ± 5% meminfo.DirectMap4k 89056 -2.0% 87309 -1.6% 87606 proc-vmstat.nr_slab_unreclaimable 16.28 -0.7% 16.16 -1.0% 16.12 turbostat.RAMWatt 0.01 ± 9% +58125.6% 4.17 ±175% +23253.5% 1.67 ±222% perf-sched.sch_delay.max.ms.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 781.67 ± 10% +6.5% 832.50 ± 19% -14.3% 670.17 ± 4% perf-sched.wait_and_delay.count.schedule_timeout.rcu_gp_fqs_loop.rcu_gp_kthread.kthread 97958 ± 7% -9.7% 88449 ± 4% -0.6% 97399 ± 4% sched_debug.cpu.avg_idle.stddev 0.00 ± 12% +24.2% 0.00 ± 17% -5.2% 0.00 ± 7% sched_debug.cpu.next_balance.stddev 6391048 -2.9% 6208584 +3.4% 6605584 will-it-scale.16.threads 399440 -2.9% 388036 +3.4% 412848 will-it-scale.per_thread_ops 6391048 -2.9% 6208584 +3.4% 6605584 will-it-scale.workload 19.99 ± 4% -2.2 17.74 +1.2 21.18 ± 2% perf-profile.calltrace.cycles-pp.fput.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 1.27 ± 5% +0.8 2.11 ± 3% +31.1 32.36 ± 2% perf-profile.calltrace.cycles-pp.__fdget.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 32.69 ± 4% +5.0 37.70 -32.7 0.00 perf-profile.calltrace.cycles-pp.__fget_light.do_poll.do_sys_poll.__x64_sys_poll.do_syscall_64 0.00 +27.9 27.85 +0.0 0.00 perf-profile.calltrace.cycles-pp.__get_file_rcu.__fget_light.do_poll.do_sys_poll.__x64_sys_poll 20.00 ± 4% -2.3 17.75 +0.4 20.43 ± 2% perf-profile.children.cycles-pp.fput 0.24 ± 10% -0.1 0.18 ± 2% -0.1 0.18 ± 10% perf-profile.children.cycles-pp.syscall_return_via_sysret 1.48 ± 5% +0.5 1.98 ± 3% +30.8 32.32 ± 2% perf-profile.children.cycles-pp.__fdget 31.85 ± 4% +6.0 37.86 -31.8 0.00 perf-profile.children.cycles-pp.__fget_light 0.00 +27.7 27.67 +0.0 0.00 perf-profile.children.cycles-pp.__get_file_rcu 30.90 ± 4% -20.6 10.35 ± 2% -30.9 0.00 perf-profile.self.cycles-pp.__fget_light 19.94 ± 4% -2.4 17.53 -0.3 19.62 ± 2% perf-profile.self.cycles-pp.fput 9.81 ± 4% -2.4 7.42 ± 2% +1.7 11.51 ± 4% perf-profile.self.cycles-pp.do_poll 0.23 ± 11% -0.1 0.17 ± 4% -0.1 0.18 ± 11% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.44 ± 7% +0.0 0.45 ± 5% +0.1 0.52 ± 4% perf-profile.self.cycles-pp.__poll 0.85 ± 4% +0.1 0.92 ± 3% +30.3 31.17 ± 2% perf-profile.self.cycles-pp.__fdget 0.00 +26.5 26.48 +0.0 0.00 perf-profile.self.cycles-pp.__get_file_rcu 2.146e+10 ± 2% +8.5% 2.329e+10 ± 2% -2.1% 2.101e+10 perf-stat.i.branch-instructions 0.22 ± 14% -0.0 0.19 ± 14% -0.0 0.20 ± 3% perf-stat.i.branch-miss-rate% 2.424e+10 ± 2% +4.1% 2.524e+10 ± 2% -4.7% 2.311e+10 perf-stat.i.dTLB-loads 1.404e+10 ± 2% +8.7% 1.526e+10 ± 2% -6.2% 1.316e+10 perf-stat.i.dTLB-stores 70.87 -2.3 68.59 -1.0 69.90 perf-stat.i.iTLB-load-miss-rate% 5267608 -5.5% 4979133 ± 2% -0.4% 5244253 perf-stat.i.iTLB-load-misses 2102507 +5.4% 2215725 +5.7% 2222286 perf-stat.i.iTLB-loads 18791 ± 3% +10.5% 20757 ± 2% -1.8% 18446 perf-stat.i.instructions-per-iTLB-miss 266.67 ± 2% +6.8% 284.75 ± 2% -4.1% 255.70 perf-stat.i.metric.M/sec 0.01 ± 10% -10.5% 0.01 ± 5% -1.8% 0.01 ± 6% perf-stat.overall.MPKI 0.19 -0.0 0.17 +0.0 0.20 perf-stat.overall.branch-miss-rate% 0.65 -3.1% 0.63 +6.1% 0.69 perf-stat.overall.cpi 0.00 ± 4% -0.0 0.00 ± 4% +0.0 0.00 ± 4% perf-stat.overall.dTLB-store-miss-rate% 71.48 -2.3 69.21 -1.2 70.24 perf-stat.overall.iTLB-load-miss-rate% 18757 +10.0% 20629 -3.2% 18161 perf-stat.overall.instructions-per-iTLB-miss 1.54 +3.2% 1.59 -5.8% 1.45 perf-stat.overall.ipc 4795147 +6.4% 5100406 -9.0% 4365017 perf-stat.overall.path-length 2.14e+10 ± 2% +8.5% 2.322e+10 ± 2% -2.1% 2.094e+10 perf-stat.ps.branch-instructions 2.417e+10 ± 2% +4.1% 2.516e+10 ± 2% -4.7% 2.303e+10 perf-stat.ps.dTLB-loads 1.4e+10 ± 2% +8.7% 1.522e+10 ± 2% -6.3% 1.312e+10 perf-stat.ps.dTLB-stores 5253923 -5.5% 4966218 ± 2% -0.5% 5228207 perf-stat.ps.iTLB-load-misses 2095770 +5.4% 2208605 +5.7% 2214962 perf-stat.ps.iTLB-loads 3.065e+13 +3.3% 3.167e+13 -5.9% 2.883e+13 perf-stat.total.instructions > > Linus