Hello, kernel test robot noticed a 3.7% improvement of will-it-scale.per_thread_ops on: commit: df671b17195cd6526e029c70d04dfb72561082d7 ("[PATCH 1/2] lib/find: Make functions safe on changing bitmaps") url: https://github.com/intel-lab-lkp/linux/commits/Jan-Kara/lib-find-Make-functions-safe-on-changing-bitmaps/20231011-230553 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 1c8b86a3799f7e5be903c3f49fcdaee29fd385b5 patch link: https://lore.kernel.org/all/20231011150252.32737-1-jack@xxxxxxx/ patch subject: [PATCH 1/2] lib/find: Make functions safe on changing bitmaps testcase: will-it-scale test machine: 104 threads 2 sockets (Skylake) with 192G memory parameters: nr_task: 50% mode: thread test: tlb_flush3 cpufreq_governor: performance Details are as below: --------------------------------------------------------------------------------------------------> The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20231025/202310251458.48b4452d-oliver.sang@xxxxxxxxx ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-12/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/tlb_flush3/will-it-scale commit: 1c8b86a379 ("Merge tag 'xsa441-6.6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip") df671b1719 ("lib/find: Make functions safe on changing bitmaps") 1c8b86a3799f7e5b df671b17195cd6526e029c70d04 ---------------- --------------------------- %stddev %change %stddev \ | \ 0.14 ± 19% +36.9% 0.19 ± 17% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write_killable.__vm_munmap 2.26e+08 +3.6% 2.343e+08 proc-vmstat.pgfault 0.04 +25.0% 0.05 turbostat.IPC 32666 -15.5% 27605 ± 2% turbostat.POLL 7856 +2.2% 8025 vmstat.system.cs 6331931 +2.3% 6478704 vmstat.system.in 700119 +3.7% 725931 will-it-scale.52.threads 13463 +3.7% 13959 will-it-scale.per_thread_ops 700119 +3.7% 725931 will-it-scale.workload 8.36 -7.3% 7.74 perf-stat.i.MPKI 4.591e+09 +3.4% 4.747e+09 perf-stat.i.branch-instructions 1.832e+08 +2.8% 1.883e+08 perf-stat.i.branch-misses 26.70 -0.3 26.40 perf-stat.i.cache-miss-rate% 7852 +2.2% 8021 perf-stat.i.context-switches 6.43 -7.2% 5.97 perf-stat.i.cpi 769.61 +1.8% 783.29 perf-stat.i.cpu-migrations 6.39e+09 +3.4% 6.606e+09 perf-stat.i.dTLB-loads 2.94e+09 +3.2% 3.035e+09 perf-stat.i.dTLB-stores 78.29 -0.9 77.44 perf-stat.i.iTLB-load-miss-rate% 18959450 +3.5% 19621273 perf-stat.i.iTLB-load-misses 5254435 +8.7% 5713444 perf-stat.i.iTLB-loads 2.236e+10 +7.7% 2.408e+10 perf-stat.i.instructions 1181 +4.0% 1228 perf-stat.i.instructions-per-iTLB-miss 0.16 +7.7% 0.17 perf-stat.i.ipc 0.02 ± 36% -49.6% 0.01 ± 53% perf-stat.i.major-faults 485.08 +3.0% 499.67 perf-stat.i.metric.K/sec 141.71 +3.2% 146.25 perf-stat.i.metric.M/sec 747997 +3.7% 775416 perf-stat.i.minor-faults 3127957 -13.9% 2693728 perf-stat.i.node-loads 26089697 +3.4% 26965335 perf-stat.i.node-store-misses 767569 +3.7% 796095 perf-stat.i.node-stores 747997 +3.7% 775416 perf-stat.i.page-faults 8.35 -7.3% 7.74 perf-stat.overall.MPKI 26.70 -0.3 26.40 perf-stat.overall.cache-miss-rate% 6.43 -7.1% 5.97 perf-stat.overall.cpi 78.30 -0.9 77.45 perf-stat.overall.iTLB-load-miss-rate% 1179 +4.0% 1226 perf-stat.overall.instructions-per-iTLB-miss 0.16 +7.7% 0.17 perf-stat.overall.ipc 9644584 +3.8% 10011125 perf-stat.overall.path-length 4.575e+09 +3.4% 4.731e+09 perf-stat.ps.branch-instructions 1.825e+08 +2.8% 1.876e+08 perf-stat.ps.branch-misses 7825 +2.2% 7995 perf-stat.ps.context-switches 767.16 +1.8% 780.76 perf-stat.ps.cpu-migrations 6.368e+09 +3.4% 6.583e+09 perf-stat.ps.dTLB-loads 2.93e+09 +3.2% 3.025e+09 perf-stat.ps.dTLB-stores 18896725 +3.5% 19555325 perf-stat.ps.iTLB-load-misses 5236456 +8.7% 5693636 perf-stat.ps.iTLB-loads 2.229e+10 +7.6% 2.399e+10 perf-stat.ps.instructions 745423 +3.7% 772705 perf-stat.ps.minor-faults 3117663 -13.9% 2684861 perf-stat.ps.node-loads 26002765 +3.4% 26875267 perf-stat.ps.node-store-misses 764789 +3.7% 793098 perf-stat.ps.node-stores 745423 +3.7% 772705 perf-stat.ps.page-faults 6.752e+12 +7.6% 7.267e+12 perf-stat.total.instructions 19.21 -1.0 18.18 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu 17.00 -0.9 16.09 perf-profile.calltrace.cycles-pp.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 65.30 -0.6 64.69 perf-profile.calltrace.cycles-pp.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64 65.34 -0.6 64.75 perf-profile.calltrace.cycles-pp.madvise_vma_behavior.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe 65.98 -0.5 65.45 perf-profile.calltrace.cycles-pp.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise 65.96 -0.5 65.42 perf-profile.calltrace.cycles-pp.do_madvise.__x64_sys_madvise.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise 9.72 ± 2% -0.5 9.20 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range 66.33 -0.5 65.81 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__madvise 66.46 -0.5 65.95 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__madvise 31.88 -0.4 31.43 perf-profile.calltrace.cycles-pp.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single 67.72 -0.4 67.28 perf-profile.calltrace.cycles-pp.__madvise 32.15 -0.4 31.73 perf-profile.calltrace.cycles-pp.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior 32.60 -0.4 32.21 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise 32.93 -0.3 32.58 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise 31.07 -0.3 30.74 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single 31.58 -0.3 31.28 perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior 31.61 -0.3 31.30 perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise 31.80 -0.3 31.51 perf-profile.calltrace.cycles-pp.unmap_page_range.zap_page_range_single.madvise_vma_behavior.do_madvise.__x64_sys_madvise 8.34 -0.1 8.22 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond.on_each_cpu_cond_mask 8.06 -0.1 7.95 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch.smp_call_function_many_cond 7.98 -0.1 7.87 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.llist_add_batch 0.59 ± 3% +0.1 0.65 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.testcase 1.46 +0.1 1.53 perf-profile.calltrace.cycles-pp.filemap_map_pages.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault 1.48 +0.1 1.55 perf-profile.calltrace.cycles-pp.do_read_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 1.53 +0.1 1.62 perf-profile.calltrace.cycles-pp.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 2.92 +0.1 3.02 perf-profile.calltrace.cycles-pp.flush_tlb_func.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function 1.26 ± 2% +0.1 1.36 perf-profile.calltrace.cycles-pp.default_send_IPI_mask_sequence_phys.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 1.84 +0.1 1.96 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 7.87 +0.1 8.00 perf-profile.calltrace.cycles-pp.llist_reverse_order.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function 2.03 ± 2% +0.1 2.17 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 2.90 +0.2 3.06 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.zap_pte_range 2.62 ± 3% +0.2 2.80 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault.testcase 2.58 ± 3% +0.2 2.76 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault.testcase 2.95 ± 3% +0.2 3.14 perf-profile.calltrace.cycles-pp.asm_exc_page_fault.testcase 2.75 +0.2 2.94 perf-profile.calltrace.cycles-pp.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range.tlb_finish_mmu 4.96 +0.3 5.29 perf-profile.calltrace.cycles-pp.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask 4.92 +0.3 5.25 perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond 5.13 +0.3 5.46 perf-profile.calltrace.cycles-pp.sysvec_call_function.asm_sysvec_call_function.smp_call_function_many_cond.on_each_cpu_cond_mask.flush_tlb_mm_range 5.08 +0.4 5.44 perf-profile.calltrace.cycles-pp.testcase 37.25 -2.0 35.24 perf-profile.children.cycles-pp.llist_add_batch 62.82 -0.8 62.04 perf-profile.children.cycles-pp.on_each_cpu_cond_mask 62.82 -0.8 62.04 perf-profile.children.cycles-pp.smp_call_function_many_cond 63.70 -0.7 62.98 perf-profile.children.cycles-pp.flush_tlb_mm_range 65.30 -0.6 64.70 perf-profile.children.cycles-pp.zap_page_range_single 65.34 -0.6 64.75 perf-profile.children.cycles-pp.madvise_vma_behavior 65.98 -0.5 65.45 perf-profile.children.cycles-pp.__x64_sys_madvise 65.96 -0.5 65.43 perf-profile.children.cycles-pp.do_madvise 66.52 -0.5 66.01 perf-profile.children.cycles-pp.do_syscall_64 66.65 -0.5 66.16 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 67.79 -0.4 67.36 perf-profile.children.cycles-pp.__madvise 32.94 -0.3 32.60 perf-profile.children.cycles-pp.tlb_finish_mmu 31.74 -0.3 31.43 perf-profile.children.cycles-pp.zap_pte_range 31.76 -0.3 31.46 perf-profile.children.cycles-pp.zap_pmd_range 31.95 -0.3 31.66 perf-profile.children.cycles-pp.unmap_page_range 0.42 ± 2% +0.0 0.46 perf-profile.children.cycles-pp.error_entry 0.20 ± 3% +0.0 0.24 ± 5% perf-profile.children.cycles-pp.up_read 0.69 +0.0 0.74 perf-profile.children.cycles-pp.native_flush_tlb_local 1.47 +0.1 1.55 perf-profile.children.cycles-pp.filemap_map_pages 1.48 +0.1 1.56 perf-profile.children.cycles-pp.do_read_fault 1.54 +0.1 1.62 perf-profile.children.cycles-pp.do_fault 2.75 +0.1 2.86 perf-profile.children.cycles-pp.default_send_IPI_mask_sequence_phys 1.85 +0.1 1.98 perf-profile.children.cycles-pp.__handle_mm_fault 2.04 ± 2% +0.1 2.18 perf-profile.children.cycles-pp.handle_mm_fault 2.63 ± 3% +0.2 2.81 perf-profile.children.cycles-pp.exc_page_fault 2.62 ± 3% +0.2 2.80 perf-profile.children.cycles-pp.do_user_addr_fault 3.24 ± 3% +0.2 3.44 perf-profile.children.cycles-pp.asm_exc_page_fault 3.83 +0.2 4.04 perf-profile.children.cycles-pp.flush_tlb_func 0.69 ± 2% +0.2 0.92 perf-profile.children.cycles-pp._find_next_bit 9.92 +0.3 10.23 perf-profile.children.cycles-pp.llist_reverse_order 5.45 +0.4 5.81 perf-profile.children.cycles-pp.testcase 18.42 +0.5 18.96 perf-profile.children.cycles-pp.asm_sysvec_call_function 16.24 +0.5 16.78 perf-profile.children.cycles-pp.__flush_smp_call_function_queue 15.78 +0.5 16.32 perf-profile.children.cycles-pp.__sysvec_call_function 16.36 +0.5 16.90 perf-profile.children.cycles-pp.sysvec_call_function 27.92 -1.9 26.04 perf-profile.self.cycles-pp.llist_add_batch 0.16 ± 2% +0.0 0.18 ± 4% perf-profile.self.cycles-pp.up_read 0.42 ± 2% +0.0 0.45 perf-profile.self.cycles-pp.error_entry 0.21 ± 4% +0.0 0.24 ± 5% perf-profile.self.cycles-pp.down_read 0.26 ± 2% +0.0 0.29 ± 3% perf-profile.self.cycles-pp.tlb_finish_mmu 2.01 +0.0 2.05 perf-profile.self.cycles-pp.default_send_IPI_mask_sequence_phys 0.68 +0.0 0.73 perf-profile.self.cycles-pp.native_flush_tlb_local 3.10 +0.2 3.26 perf-profile.self.cycles-pp.flush_tlb_func 0.50 ± 2% +0.2 0.68 perf-profile.self.cycles-pp._find_next_bit 9.92 +0.3 10.22 perf-profile.self.cycles-pp.llist_reverse_order 16.10 +0.5 16.64 perf-profile.self.cycles-pp.smp_call_function_many_cond Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests/wiki