On Mon, Jul 31, 2023 at 3:35 PM kernel test robot <oliver.sang@xxxxxxxxx> wrote: > > > > Hello, > > kernel test robot noticed a -7.3% regression of stress-ng.sock.ops_per_sec on: > > > commit: dfa2f0483360d4d6f2324405464c9f281156bd87 ("tcp: get rid of sysctl_tcp_adv_win_scale") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > testcase: stress-ng > test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 128G memory > parameters: > TCP 'performance' on some tests depends on initial values for tcp_rmem[] (and many others sysctl) The commit changed some initial RWIN values for some MTU/MSS setings, it is next to impossible to make a change that is a win for all cases. If you care about a particular real workload, not a synthetic benchmark, I think you should give us more details. Thanks. > nr_threads: 1 > disk: 1HDD > testtime: 60s > fs: ext4 > class: os > test: sock > cpufreq_governor: performance > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > the same patch/commit), kindly add following tags > | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > | Closes: https://lore.kernel.org/oe-lkp/202307312121.d8479e5e-oliver.sang@xxxxxxxxx > > > Details are as below: > --------------------------------------------------------------------------------------------------> > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > sudo bin/lkp install job.yaml # job file is attached in this email > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run > sudo bin/lkp run generated-yaml-file > > # if come across any failure that blocks the test, > # please remove ~/.lkp and /lkp dir to run from a clean state. > > ========================================================================================= > class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: > os/gcc-12/performance/1HDD/ext4/x86_64-rhel-8.3/1/debian-11.1-x86_64-20220510.cgz/lkp-csl-d02/sock/stress-ng/60s > > commit: > 63c8778d91 ("Merge branch 'net-mana-fix-doorbell-access-for-receive-queues'") > dfa2f04833 ("tcp: get rid of sysctl_tcp_adv_win_scale") > > 63c8778d9149d5df dfa2f0483360d4d6f2324405464 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 8094125 +21.5% 9832824 ą 18% cpuidle..usage > 5.04 -6.1% 4.73 ą 10% iostat.cpu.system > 330990 ą 2% -32.3% 223958 ą 3% turbostat.C1 > 4685666 +22.3% 5729557 turbostat.POLL > 23600 ą 8% +51.9% 35849 ą 25% sched_debug.cfs_rq:/.min_vruntime.max > 4907 ą 7% +44.2% 7073 ą 45% sched_debug.cfs_rq:/.min_vruntime.stddev > 4911 ą 7% +44.1% 7075 ą 45% sched_debug.cfs_rq:/.spread0.stddev > 43.08 ą 15% -41.0% 25.42 ą 32% perf-sched.wait_and_delay.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 269948 ą 2% +8.1% 291932 ą 2% perf-sched.wait_and_delay.count.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked > 43.08 ą 15% -41.0% 25.42 ą 32% perf-sched.wait_time.avg.ms.__cond_resched.generic_perform_write.generic_file_write_iter.vfs_write.ksys_write > 0.02 ą 31% +35.0% 0.03 ą 5% perf-sched.wait_time.max.ms.__cond_resched.aa_sk_perm.security_socket_sendmsg.sock_sendmsg.__sys_sendto > 93552 -7.3% 86706 stress-ng.sock.ops > 1559 -7.3% 1445 stress-ng.sock.ops_per_sec > 139.17 -3.4% 134.50 stress-ng.time.percent_of_cpu_this_job_got > 5092570 +18.6% 6039727 stress-ng.time.voluntary_context_switches > 1.45 +1.4 2.83 ą105% perf-stat.i.branch-miss-rate% > 1620951 ą 30% -39.7% 977769 ą 37% perf-stat.i.dTLB-store-misses > 911.68 -3.6% 878.55 perf-stat.i.instructions-per-iTLB-miss > 1.54 +0.2 1.69 ą 15% perf-stat.overall.branch-miss-rate% > 0.16 ą 30% -0.1 0.10 ą 22% perf-stat.overall.dTLB-store-miss-rate% > 742.16 -4.3% 710.16 perf-stat.overall.instructions-per-iTLB-miss > 1595258 ą 30% -39.6% 962800 ą 37% perf-stat.ps.dTLB-store-misses > 67709 +12.6% 76211 ą 14% proc-vmstat.nr_active_anon > 73849 +11.0% 81975 ą 11% proc-vmstat.nr_shmem > 67709 +12.6% 76211 ą 14% proc-vmstat.nr_zone_active_anon > 6320969 -6.7% 5895784 proc-vmstat.numa_hit > 6314894 -6.8% 5885708 proc-vmstat.numa_local > 102508 +5.9% 108525 proc-vmstat.pgactivate > 48068383 -7.3% 44558110 proc-vmstat.pgalloc_normal > 47937851 -7.3% 44421205 proc-vmstat.pgfree > 0.70 ą 14% +0.2 0.88 ą 14% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.flush_smp_call_function_queue.do_idle.cpu_startup_entry > 0.48 ą 47% +0.2 0.70 ą 14% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 2.76 ą 9% +0.5 3.30 ą 2% perf-profile.calltrace.cycles-pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish > 0.39 ą 72% +0.6 0.95 ą 24% perf-profile.calltrace.cycles-pp.try_to_wake_up.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_data_queue > 3.32 ą 10% +0.7 4.00 perf-profile.calltrace.cycles-pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_core > 6.88 ą 7% +0.8 7.71 ą 2% perf-profile.calltrace.cycles-pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_action.__do_softirq > 7.18 ą 7% +0.8 8.02 ą 2% perf-profile.calltrace.cycles-pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip > 7.16 ą 7% +0.9 8.02 ą 2% perf-profile.calltrace.cycles-pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq > 8.90 ą 6% +1.0 9.89 perf-profile.calltrace.cycles-pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit > 9.37 ą 6% +1.0 10.40 perf-profile.calltrace.cycles-pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb > 9.33 ą 6% +1.0 10.37 perf-profile.calltrace.cycles-pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit > 9.26 ą 6% +1.0 10.30 perf-profile.calltrace.cycles-pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2 > 2.48 ą 17% +1.3 3.82 ą 2% perf-profile.calltrace.cycles-pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_sendmsg_locked > 2.61 ą 17% +1.3 3.96 ą 2% perf-profile.calltrace.cycles-pp.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_sendmsg_locked.tcp_sendmsg > 0.80 ą 15% -0.4 0.43 ą 10% perf-profile.children.cycles-pp.tcp_rcv_space_adjust > 1.35 ą 5% -0.2 1.19 ą 6% perf-profile.children.cycles-pp.__entry_text_start > 0.56 ą 15% -0.2 0.40 ą 11% perf-profile.children.cycles-pp.__x64_sys_connect > 0.56 ą 15% -0.2 0.40 ą 11% perf-profile.children.cycles-pp.__sys_connect > 0.55 ą 14% -0.2 0.40 ą 12% perf-profile.children.cycles-pp.inet_stream_connect > 0.55 ą 15% -0.1 0.40 ą 12% perf-profile.children.cycles-pp.__inet_stream_connect > 0.38 ą 11% -0.1 0.28 ą 21% perf-profile.children.cycles-pp.exit_to_user_mode_loop > 0.44 ą 9% -0.1 0.33 ą 13% perf-profile.children.cycles-pp.__close > 0.37 ą 12% -0.1 0.27 ą 20% perf-profile.children.cycles-pp.task_work_run > 0.77 ą 5% -0.1 0.68 ą 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode > 0.34 ą 12% -0.1 0.26 ą 21% perf-profile.children.cycles-pp.__fput > 0.31 ą 11% -0.1 0.23 ą 18% perf-profile.children.cycles-pp.tcp_v4_connect > 0.22 ą 14% -0.1 0.16 ą 22% perf-profile.children.cycles-pp.__sock_release > 0.22 ą 14% -0.1 0.16 ą 22% perf-profile.children.cycles-pp.sock_close > 0.23 ą 19% -0.1 0.16 ą 14% perf-profile.children.cycles-pp.tcp_try_coalesce > 0.09 ą 14% -0.0 0.05 ą 48% perf-profile.children.cycles-pp.new_inode_pseudo > 0.07 ą 12% -0.0 0.04 ą 72% perf-profile.children.cycles-pp.__ns_get_path > 0.17 ą 8% +0.0 0.22 ą 8% perf-profile.children.cycles-pp.ip_send_check > 0.23 ą 7% +0.0 0.28 ą 7% perf-profile.children.cycles-pp.ip_local_out > 0.09 ą 22% +0.0 0.14 ą 10% perf-profile.children.cycles-pp.available_idle_cpu > 0.22 ą 9% +0.1 0.26 ą 7% perf-profile.children.cycles-pp.__ip_local_out > 0.46 ą 11% +0.1 0.56 ą 4% perf-profile.children.cycles-pp.ttwu_queue_wakelist > 0.92 ą 3% +0.1 1.06 ą 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave > 7.10 ą 2% +0.7 7.76 ą 3% perf-profile.children.cycles-pp.tcp_v4_rcv > 7.21 ą 2% +0.7 7.90 ą 3% perf-profile.children.cycles-pp.ip_protocol_deliver_rcu > 7.42 ą 2% +0.7 8.12 ą 3% perf-profile.children.cycles-pp.ip_local_deliver_finish > 8.00 ą 2% +0.7 8.71 ą 2% perf-profile.children.cycles-pp.__netif_receive_skb_one_core > 8.34 ą 2% +0.7 9.06 ą 2% perf-profile.children.cycles-pp.__napi_poll > 8.32 ą 2% +0.7 9.05 ą 2% perf-profile.children.cycles-pp.process_backlog > 11.71 ą 3% +0.9 12.63 ą 2% perf-profile.children.cycles-pp.__dev_queue_xmit > 13.86 ą 2% +0.9 14.78 ą 2% perf-profile.children.cycles-pp.__tcp_transmit_skb > 11.92 ą 2% +0.9 12.86 ą 2% perf-profile.children.cycles-pp.ip_finish_output2 > 10.05 ą 3% +0.9 10.99 ą 2% perf-profile.children.cycles-pp.net_rx_action > 12.66 ą 2% +1.0 13.62 ą 2% perf-profile.children.cycles-pp.__ip_queue_xmit > 10.56 ą 3% +1.0 11.53 ą 2% perf-profile.children.cycles-pp.do_softirq > 10.82 ą 3% +1.0 11.80 ą 2% perf-profile.children.cycles-pp.__local_bh_enable_ip > 10.94 ą 4% +1.0 11.94 ą 2% perf-profile.children.cycles-pp.__do_softirq > 0.52 ą 21% -0.4 0.16 ą 16% perf-profile.self.cycles-pp.tcp_rcv_space_adjust > 0.62 ą 7% -0.1 0.48 ą 10% perf-profile.self.cycles-pp.tcp_sendmsg > 0.63 ą 5% -0.1 0.55 ą 7% perf-profile.self.cycles-pp.__entry_text_start > 0.10 ą 15% +0.0 0.14 ą 13% perf-profile.self.cycles-pp.schedule_timeout > 0.10 ą 20% +0.0 0.14 ą 16% perf-profile.self.cycles-pp.enqueue_entity > 0.08 ą 22% +0.1 0.14 ą 11% perf-profile.self.cycles-pp.available_idle_cpu > 0.37 ą 8% +0.1 0.44 ą 4% perf-profile.self.cycles-pp.net_rx_action > 0.92 ą 3% +0.1 1.06 ą 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave > > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are provided > for informational purposes only. Any difference in system hardware or software > design or configuration may affect actual performance. > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki > >