Thanks for sharing the data, Oliver! > -----Original Message----- > From: Sang, Oliver <oliver.sang@xxxxxxxxx> > Sent: Wednesday, May 31, 2023 4:46 PM > To: Shakeel Butt <shakeelb@xxxxxxxxxx> > Cc: Zhang, Cathy <cathy.zhang@xxxxxxxxx>; Yin, Fengwei > <fengwei.yin@xxxxxxxxx>; Tang, Feng <feng.tang@xxxxxxxxx>; Eric Dumazet > <edumazet@xxxxxxxxxx>; Linux MM <linux-mm@xxxxxxxxx>; Cgroups > <cgroups@xxxxxxxxxxxxxxx>; Paolo Abeni <pabeni@xxxxxxxxxx>; > davem@xxxxxxxxxxxxx; kuba@xxxxxxxxxx; Brandeburg, Jesse > <jesse.brandeburg@xxxxxxxxx>; Srinivas, Suresh > <suresh.srinivas@xxxxxxxxx>; Chen, Tim C <tim.c.chen@xxxxxxxxx>; You, > Lizhen <lizhen.you@xxxxxxxxx>; eric.dumazet@xxxxxxxxx; > netdev@xxxxxxxxxxxxxxx; Li, Philip <philip.li@xxxxxxxxx>; Liu, Yujie > <yujie.liu@xxxxxxxxx>; Sang, Oliver <oliver.sang@xxxxxxxxx> > Subject: Re: [PATCH net-next 1/2] net: Keep sk->sk_forward_alloc as a proper > size > > hi, Shakeel, > > On Wed, May 17, 2023 at 04:24:47PM +0000, Shakeel Butt wrote: > > On Tue, May 16, 2023 at 01:46:55PM +0800, Oliver Sang wrote: > > > hi Shakeel, > > > > > > On Mon, May 15, 2023 at 12:50:31PM -0700, Shakeel Butt wrote: > > > > +Feng, Yin and Oliver > > > > > > > > > > > > > > > Thanks a lot Cathy for testing. Do you see any performance > improvement for > > > > > > the memcached benchmark with the patch? > > > > > > > > > > Yep, absolutely :- ) RPS (with/without patch) = +1.74 > > > > > > > > Thanks a lot Cathy. > > > > > > > > Feng/Yin/Oliver, can you please test the patch at [1] with other > > > > workloads used by the test robot? Basically I wanted to know if it has > > > > any positive or negative impact on other perf benchmarks. > > > > > > is it possible for you to resend patch with Signed-off-by? > > > without it, test robot will regard the patch as informal, then it cannot feed > > > into auto test process. > > > and could you tell us the base of this patch? it will help us apply it > > > correctly. > > > > > > on the other hand, due to resource restraint, we normally cannot support > > > this type of on-demand test upon a single patch, patch set, or a branch. > > > instead, we try to merge them into so-called hourly-kernels, then > distribute > > > tests and auto-bisects to various platforms. > > > after we applying your patch and merging it to hourly-kernels sccussfully, > > > if it really causes some performance changes, the test robot could spot > out > > > this patch as 'fbc' and we will send report to you. this could happen > within > > > several weeks after applying. > > > but due to the complexity of whole process (also limited resourse, such > like > > > we cannot run all tests on all platforms), we cannot guanrantee capture > all > > > possible performance impacts of this patch. and it's hard for us to > provide > > > a big picture like what's the general performance impact of this patch. > > > this maybe is not exactly what you want. is it ok for you? > > > > > > > > > > Yes, that is fine and thanks for the help. The patch is below: > > we applied below patch upon v6.4-rc2, so far, we didn't spot out > performance > impacts of it to other tests. > > but we found -7.6% regression of netperf.Throughput_Mbps > > testcase: netperf > test machine: 128 threads 4 sockets Intel(R) Xeon(R) Gold 6338 CPU @ > 2.00GHz (Ice Lake) with 256G memory > parameters: > > ip: ipv4 > runtime: 300s > nr_threads: 50% > cluster: cs-localhost > send_size: 10K > test: TCP_SENDFILE > cpufreq_governor: performance > > > To reproduce: > > git clone https://github.com/intel/lkp-tests.git > cd lkp-tests > sudo bin/lkp install job.yaml # job file is attached in this email > bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp > run > sudo bin/lkp run generated-yaml-file > > # if come across any failure that blocks the test, > # please remove ~/.lkp and /lkp dir to run from a clean state. > > > ================================================================ > ========================= > cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/s > end_size/tbox_group/test/testcase: > cs-localhost/gcc-11/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1- > x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/TCP_SENDFILE/netperf > > commit: > v6.4-rc2 > 5e32037c50 ("memcg: skip stock refill in irq context") > > v6.4-rc2 5e32037c5065d2058264d41cd4c > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 23165 -7.6% 21414 netperf.Throughput_Mbps > 1482569 -7.6% 1370534 netperf.Throughput_total_Mbps > > detail data as below [1] > > > at the same time, we tested Cathy's patch upon same test, found > a 29.4% improvement of netperf.Throughput_Mbps > just FYI > > > ================================================================ > ========================= > cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/s > end_size/tbox_group/test/testcase: > cs-localhost/gcc-11/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1- > x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/TCP_SENDFILE/netperf > > commit: > ed23734c23 ("Merge tag 'net-6.4-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") > 05d72a8bed ("net: Keep sk->sk_forward_alloc as a proper size") > > ed23734c23d2fc1e 05d72a8bedfacfc46f300ab38e0 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 23218 +29.4% 30043 netperf.Throughput_Mbps > 1485996 +29.4% 1922763 netperf.Throughput_total_Mbps > > detail data as below [2] > > > [1] > > ================================================================ > ========================= > cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/s > end_size/tbox_group/test/testcase: > cs-localhost/gcc-11/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1- > x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/TCP_SENDFILE/netperf > > commit: > v6.4-rc2 > 5e32037c50 ("memcg: skip stock refill in irq context") > > v6.4-rc2 5e32037c5065d2058264d41cd4c > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 5106608 -1.3% 5040930 vmstat.system.cs > 246222 ± 4% -21.9% 192291 ± 8% sched_debug.cpu.avg_idle.avg > 269582 ± 6% -24.9% 202436 ± 13% sched_debug.cpu.avg_idle.stddev > 2556 +0.9% 2579 turbostat.Bzy_MHz > 15.01 +0.8 15.76 turbostat.C1% > 30.63 +4.2% 31.90 ± 2% turbostat.RAMWatt > 23165 -7.6% 21414 netperf.Throughput_Mbps > 1482569 -7.6% 1370534 netperf.Throughput_total_Mbps > 670.10 -11.8% 591.36 netperf.time.user_time > 5.429e+09 -7.6% 5.019e+09 netperf.workload > 6.93 +6.4% 7.38 perf-stat.i.MPKI > 4.404e+10 -5.4% 4.167e+10 perf-stat.i.branch-instructions > 0.88 +0.0 0.90 perf-stat.i.branch-miss-rate% > 3.823e+08 -2.7% 3.721e+08 perf-stat.i.branch-misses > 6.54 ± 2% +0.4 6.90 ± 3% perf-stat.i.cache-miss-rate% > 1.05e+08 ± 3% +6.3% 1.117e+08 ± 3% perf-stat.i.cache-misses > 1.29 +5.8% 1.37 perf-stat.i.cpi > 27150 ± 6% +14.9% 31203 ± 5% perf-stat.i.cpu-migrations > 2897 ± 3% -5.7% 2733 ± 3% perf-stat.i.cycles-between-cache- > misses > 0.01 ± 12% +0.0 0.01 perf-stat.i.dTLB-load-miss-rate% > 6712601 ± 12% +7.8% 7237514 perf-stat.i.dTLB-load-misses > 6.874e+10 -5.4% 6.505e+10 perf-stat.i.dTLB-loads > 0.00 ± 5% +0.0 0.00 ± 5% perf-stat.i.dTLB-store-miss-rate% > 940096 ± 5% +15.3% 1083508 ± 5% perf-stat.i.dTLB-store-misses > 3.753e+10 -5.5% 3.547e+10 perf-stat.i.dTLB-stores > 2.332e+11 -5.4% 2.207e+11 perf-stat.i.instructions > 0.77 -5.4% 0.73 perf-stat.i.ipc > 1186 -5.3% 1123 perf-stat.i.metric.M/sec > 706578 ± 8% +33.2% 941322 ± 5% perf-stat.i.node-loads > 2812685 ± 8% +15.6% 3250382 ± 10% perf-stat.i.node-stores > 6.93 +6.4% 7.37 perf-stat.overall.MPKI > 0.87 +0.0 0.89 perf-stat.overall.branch-miss-rate% > 6.50 ± 2% +0.4 6.86 ± 3% perf-stat.overall.cache-miss-rate% > 1.29 +5.8% 1.37 perf-stat.overall.cpi > 2878 ± 3% -5.8% 2711 ± 3% perf-stat.overall.cycles-between- > cache-misses > 0.01 ± 12% +0.0 0.01 perf-stat.overall.dTLB-load-miss-rate% > 0.00 ± 5% +0.0 0.00 ± 5% perf-stat.overall.dTLB-store-miss-rate% > 0.77 -5.5% 0.73 perf-stat.overall.ipc > 12903 +2.4% 13208 perf-stat.overall.path-length > 4.39e+10 -5.4% 4.154e+10 perf-stat.ps.branch-instructions > 3.81e+08 -2.7% 3.708e+08 perf-stat.ps.branch-misses > 1.047e+08 ± 3% +6.3% 1.113e+08 ± 3% perf-stat.ps.cache-misses > 27021 ± 6% +14.9% 31054 ± 5% perf-stat.ps.cpu-migrations > 6672234 ± 12% +7.8% 7195318 perf-stat.ps.dTLB-load-misses > 6.852e+10 -5.4% 6.484e+10 perf-stat.ps.dTLB-loads > 935167 ± 5% +15.3% 1077856 ± 5% perf-stat.ps.dTLB-store-misses > 3.741e+10 -5.5% 3.536e+10 perf-stat.ps.dTLB-stores > 2.324e+11 -5.4% 2.199e+11 perf-stat.ps.instructions > 704145 ± 8% +33.2% 938240 ± 5% perf-stat.ps.node-loads > 2802795 ± 8% +15.5% 3238090 ± 10% perf-stat.ps.node-stores > 7.006e+13 -5.4% 6.629e+13 perf-stat.total.instructions > 11.29 -0.9 10.42 perf-profile.calltrace.cycles- > pp.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg. > sock_recvmsg > 11.22 -0.9 10.35 perf-profile.calltrace.cycles- > pp.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_r > ecvmsg.inet_recvmsg > 29.43 -0.7 28.74 perf-profile.calltrace.cycles- > pp.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvf > rom > 7.04 -0.5 6.51 perf-profile.calltrace.cycles- > pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvms > g_locked.tcp_recvmsg > 7.36 -0.5 6.86 perf-profile.calltrace.cycles- > pp.generic_file_splice_read.splice_direct_to_actor.do_splice_direct.do_sendf > ile.__x64_sys_sendfile64 > 6.56 -0.5 6.06 perf-profile.calltrace.cycles- > pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp > _recvmsg_locked > 6.45 -0.4 6.03 perf-profile.calltrace.cycles- > pp.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_d > irect.do_sendfile > 2.95 -0.3 2.61 ± 7% perf-profile.calltrace.cycles- > pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy > _datagram_iter.tcp_recvmsg_locked > 2.58 ± 2% -0.3 2.29 ± 7% perf-profile.calltrace.cycles- > pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_data > gram_iter.skb_copy_datagram_iter > 3.22 -0.3 2.93 perf-profile.calltrace.cycles- > pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_r > ecvmsg_locked.tcp_recvmsg > 10.00 -0.3 9.75 perf-profile.calltrace.cycles- > pp.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_s > kb.tcp_recvmsg_locked > 10.15 -0.2 9.91 perf-profile.calltrace.cycles- > pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg_loc > ked.tcp_recvmsg > 2.89 -0.2 2.66 perf-profile.calltrace.cycles- > pp.filemap_get_read_batch.filemap_get_pages.filemap_read.generic_file_sp > lice_read.splice_direct_to_actor > 3.12 -0.2 2.90 perf-profile.calltrace.cycles- > pp.filemap_get_pages.filemap_read.generic_file_splice_read.splice_direct_t > o_actor.do_splice_direct > 10.47 -0.2 10.25 perf-profile.calltrace.cycles- > pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg_locked.tcp_recvmsg.i > net_recvmsg > 2.66 -0.2 2.44 perf-profile.calltrace.cycles- > pp.tcp_write_xmit.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_s > endpage > 2.42 -0.2 2.22 perf-profile.calltrace.cycles- > pp.__tcp_transmit_skb.tcp_write_xmit.do_tcp_sendpages.tcp_sendpage.inet > _sendpage > 2.48 -0.2 2.29 ± 7% perf-profile.calltrace.cycles- > pp.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_ > rcv.ip_protocol_deliver_rcu > 2.46 -0.2 2.27 ± 7% perf-profile.calltrace.cycles- > pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_ > do_rcv.tcp_v4_rcv > 2.23 -0.2 2.05 perf-profile.calltrace.cycles- > pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.do_tcp_sendpages. > tcp_sendpage > 2.14 -0.2 1.96 perf-profile.calltrace.cycles- > pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.d > o_tcp_sendpages > 1.27 -0.1 1.17 perf-profile.calltrace.cycles- > pp.tcp_send_mss.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_se > ndpage > 1.17 -0.1 1.09 perf-profile.calltrace.cycles- > pp.__tcp_push_pending_frames.do_tcp_sendpages.tcp_sendpage.inet_send > page.kernel_sendpage > 1.10 -0.1 1.02 perf-profile.calltrace.cycles- > pp.tcp_write_xmit.__tcp_push_pending_frames.do_tcp_sendpages.tcp_send > page.inet_sendpage > 0.91 -0.1 0.84 perf-profile.calltrace.cycles- > pp.tcp_current_mss.tcp_send_mss.do_tcp_sendpages.tcp_sendpage.inet_se > ndpage > 1.29 -0.1 1.23 perf-profile.calltrace.cycles- > pp.copy_page_to_iter_pipe.filemap_read.generic_file_splice_read.splice_dir > ect_to_actor.do_splice_direct > 0.77 -0.0 0.73 perf-profile.calltrace.cycles- > pp.tcp_stream_alloc_skb.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.in > et_sendpage > 0.81 -0.0 0.77 perf-profile.calltrace.cycles- > pp.activate_task.ttwu_do_activate.sched_ttwu_pending.__sysvec_call_functi > on_single.sysvec_call_function_single > 0.78 -0.0 0.74 perf-profile.calltrace.cycles- > pp.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending._ > _sysvec_call_function_single > 0.55 -0.0 0.53 perf-profile.calltrace.cycles- > pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.sched > _ttwu_pending > 0.93 +0.0 0.96 perf-profile.calltrace.cycles- > pp.try_to_wake_up.__wake_up_common.__wake_up_common_lock.sock_d > ef_readable.tcp_data_queue > 1.05 +0.0 1.08 perf-profile.calltrace.cycles- > pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_ > data_queue.tcp_rcv_established > 1.10 +0.0 1.13 perf-profile.calltrace.cycles- > pp.__wake_up_common_lock.sock_def_readable.tcp_data_queue.tcp_rcv_e > stablished.tcp_v4_do_rcv > 1.20 +0.0 1.24 perf-profile.calltrace.cycles- > pp.sock_def_readable.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.t > cp_v4_rcv > 15.73 +0.2 15.97 perf-profile.calltrace.cycles- > pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_fini > sh_output2 > 15.13 +0.3 15.38 perf-profile.calltrace.cycles- > pp.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queu > e_xmit > 13.50 +0.3 13.82 perf-profile.calltrace.cycles- > pp.__napi_poll.net_rx_action.__do_softirq.do_softirq.__local_bh_enable_ip > 13.45 +0.3 13.77 perf-profile.calltrace.cycles- > pp.process_backlog.__napi_poll.net_rx_action.__do_softirq.do_softirq > 13.06 +0.3 13.38 perf-profile.calltrace.cycles- > pp.__netif_receive_skb_one_core.process_backlog.__napi_poll.net_rx_actio > n.__do_softirq > 2.23 ± 2% +0.4 2.60 ± 3% perf-profile.calltrace.cycles- > pp.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom > 12.08 +0.4 12.46 perf-profile.calltrace.cycles- > pp.ip_local_deliver_finish.__netif_receive_skb_one_core.process_backlog.__ > napi_poll.net_rx_action > 12.02 +0.4 12.41 perf-profile.calltrace.cycles- > pp.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receive_skb_one_ > core.process_backlog.__napi_poll > 1.12 +0.4 1.51 ± 3% perf-profile.calltrace.cycles- > pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__relea > se_sock > 1.31 +0.4 1.71 ± 3% perf-profile.calltrace.cycles- > pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock > 11.73 +0.4 12.14 perf-profile.calltrace.cycles- > pp.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish.__netif_receiv > e_skb_one_core.process_backlog > 1.34 ± 13% +0.4 1.76 ± 6% perf-profile.calltrace.cycles- > pp.__sk_mem_reduce_allocated.tcp_recvmsg_locked.tcp_recvmsg.inet_recv > msg.sock_recvmsg > 1.73 ± 14% +0.5 2.19 ± 7% perf-profile.calltrace.cycles- > pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_recv > msg > 1.38 ± 14% +0.5 1.85 ± 7% perf-profile.calltrace.cycles- > pp.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.relea > se_sock > 5.62 +0.5 6.11 perf-profile.calltrace.cycles- > pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sen > dpage > 5.61 +0.5 6.10 perf-profile.calltrace.cycles- > pp.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sendpage.inet_sendpage > 8.89 +0.5 9.40 perf-profile.calltrace.cycles- > pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish > .__netif_receive_skb_one_core > 8.74 +0.5 9.26 perf-profile.calltrace.cycles- > pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip > _local_deliver_finish > 2.86 +0.6 3.46 ± 3% perf-profile.calltrace.cycles- > pp.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protoc > ol_deliver_rcu > 0.58 ± 3% +0.6 1.19 ± 9% perf-profile.calltrace.cycles- > pp.mem_cgroup_charge_skmem.tcp_data_queue.tcp_rcv_established.tcp_v > 4_do_rcv.tcp_v4_rcv > 1.29 ± 15% +0.6 1.94 ± 8% perf-profile.calltrace.cycles- > pp.__sk_mem_reduce_allocated.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_estab > lished.tcp_v4_do_rcv > 7.18 ± 2% +0.7 7.87 ± 2% perf-profile.calltrace.cycles- > pp.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv > _established.tcp_v4_do_rcv > 6.06 +0.7 6.76 ± 2% perf-profile.calltrace.cycles- > pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pendin > g_frames.tcp_rcv_established > 0.35 ± 70% +0.7 1.07 ± 32% perf-profile.calltrace.cycles- > pp.refill_stock.__sk_mem_reduce_allocated.tcp_clean_rtx_queue.tcp_ack.tc > p_rcv_established > 6.02 +0.7 6.75 ± 2% perf-profile.calltrace.cycles- > pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_ > do_rcv.__release_sock > 6.05 +0.7 6.78 ± 2% perf-profile.calltrace.cycles- > pp.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_do_rcv.__releas > e_sock.release_sock > 0.39 ± 70% +0.8 1.20 ± 22% perf-profile.calltrace.cycles- > pp.page_counter_try_charge.try_charge_memcg.mem_cgroup_charge_skme > m.tcp_data_queue.tcp_rcv_established > 16.80 +0.8 17.62 perf-profile.calltrace.cycles- > pp.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_s > endpage > 46.63 +0.9 47.53 perf-profile.calltrace.cycles- > pp.do_splice_direct.do_sendfile.__x64_sys_sendfile64.do_syscall_64.entry_S > YSCALL_64_after_hwframe > 0.53 ± 4% +0.9 1.46 ± 9% perf-profile.calltrace.cycles- > pp.page_counter_try_charge.try_charge_memcg.mem_cgroup_charge_skme > m.__sk_mem_raise_allocated.__sk_mem_schedule > 46.04 +1.0 47.00 perf-profile.calltrace.cycles- > pp.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64 > .do_syscall_64 > 0.00 +1.0 0.98 ± 33% perf-profile.calltrace.cycles- > pp.page_counter_uncharge.drain_stock.refill_stock.__sk_mem_reduce_alloc > ated.tcp_clean_rtx_queue > 0.00 +1.0 0.99 ± 33% perf-profile.calltrace.cycles- > pp.drain_stock.refill_stock.__sk_mem_reduce_allocated.tcp_clean_rtx_queu > e.tcp_ack > 9.51 +1.2 10.67 ± 2% perf-profile.calltrace.cycles- > pp.release_sock.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_sendp > age > 8.17 +1.2 9.34 ± 2% perf-profile.calltrace.cycles- > pp.__release_sock.release_sock.tcp_sendpage.inet_sendpage.kernel_sendp > age > 10.68 +1.3 11.98 perf-profile.calltrace.cycles- > pp.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_s > endpage > 0.96 ± 15% +1.4 2.34 ± 11% perf-profile.calltrace.cycles- > pp.try_charge_memcg.mem_cgroup_charge_skmem.tcp_data_queue.tcp_rc > v_established.tcp_v4_do_rcv > 7.84 +1.5 9.30 perf-profile.calltrace.cycles- > pp.tcp_wmem_schedule.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.in > et_sendpage > 7.60 +1.5 9.08 perf-profile.calltrace.cycles- > pp.__sk_mem_schedule.tcp_wmem_schedule.tcp_build_frag.do_tcp_sendp > ages.tcp_sendpage > 36.91 +1.5 38.40 perf-profile.calltrace.cycles- > pp.generic_splice_sendpage.direct_splice_actor.splice_direct_to_actor.do_sp > lice_direct.do_sendfile > 37.04 +1.5 38.53 perf-profile.calltrace.cycles- > pp.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile._ > _x64_sys_sendfile64 > 7.41 +1.5 8.91 perf-profile.calltrace.cycles- > pp.__sk_mem_raise_allocated.__sk_mem_schedule.tcp_wmem_schedule.tc > p_build_frag.do_tcp_sendpages > 36.49 +1.5 38.02 perf-profile.calltrace.cycles- > pp.__splice_from_pipe.generic_splice_sendpage.direct_splice_actor.splice_d > irect_to_actor.do_splice_direct > 1.47 ± 3% +1.6 3.11 ± 7% perf-profile.calltrace.cycles- > pp.try_charge_memcg.mem_cgroup_charge_skmem.__sk_mem_raise_alloca > ted.__sk_mem_schedule.tcp_wmem_schedule > 34.61 +1.7 36.26 perf-profile.calltrace.cycles- > pp.pipe_to_sendpage.__splice_from_pipe.generic_splice_sendpage.direct_s > plice_actor.splice_direct_to_actor > 34.29 +1.7 35.97 perf-profile.calltrace.cycles- > pp.sock_sendpage.pipe_to_sendpage.__splice_from_pipe.generic_splice_se > ndpage.direct_splice_actor > 34.10 +1.7 35.79 perf-profile.calltrace.cycles- > pp.kernel_sendpage.sock_sendpage.pipe_to_sendpage.__splice_from_pipe. > generic_splice_sendpage > 33.73 +1.7 35.46 perf-profile.calltrace.cycles- > pp.inet_sendpage.kernel_sendpage.sock_sendpage.pipe_to_sendpage.__spli > ce_from_pipe > 33.24 +1.8 35.02 perf-profile.calltrace.cycles- > pp.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_sendpage.pipe_to_s > endpage > 4.46 ± 2% +2.0 6.42 ± 2% perf-profile.calltrace.cycles- > pp.mem_cgroup_charge_skmem.__sk_mem_raise_allocated.__sk_mem_sch > edule.tcp_wmem_schedule.tcp_build_frag > 11.28 -0.9 10.40 perf-profile.children.cycles- > pp.__skb_datagram_iter > 11.30 -0.9 10.42 perf-profile.children.cycles- > pp.skb_copy_datagram_iter > 29.47 -0.7 28.77 perf-profile.children.cycles- > pp.tcp_recvmsg_locked > 7.09 -0.5 6.56 perf-profile.children.cycles-pp._copy_to_iter > 6.70 -0.5 6.20 perf-profile.children.cycles-pp.copyout > 7.44 -0.5 6.94 perf-profile.children.cycles- > pp.generic_file_splice_read > 6.58 -0.4 6.15 perf-profile.children.cycles-pp.filemap_read > 3.26 -0.3 2.97 perf-profile.children.cycles- > pp.simple_copy_to_iter > 3.16 -0.3 2.88 perf-profile.children.cycles- > pp.__check_object_size > 2.93 -0.2 2.70 perf-profile.children.cycles- > pp.filemap_get_read_batch > 2.65 ± 2% -0.2 2.42 perf-profile.children.cycles- > pp.check_heap_object > 3.16 -0.2 2.93 perf-profile.children.cycles- > pp.filemap_get_pages > 1.32 -0.1 1.22 perf-profile.children.cycles-pp.tcp_send_mss > 1.33 -0.1 1.23 perf-profile.children.cycles-pp.touch_atime > 1.22 ± 2% -0.1 1.12 perf-profile.children.cycles- > pp.security_file_permission > 5.62 -0.1 5.53 perf-profile.children.cycles- > pp.lock_sock_nested > 1.08 -0.1 1.00 perf-profile.children.cycles- > pp.atime_needs_update > 1.08 -0.1 1.00 perf-profile.children.cycles- > pp.tcp_current_mss > 0.96 ± 3% -0.1 0.88 perf-profile.children.cycles- > pp.apparmor_file_permission > 1.35 -0.1 1.28 perf-profile.children.cycles- > pp.copy_page_to_iter_pipe > 0.57 ± 3% -0.1 0.51 perf-profile.children.cycles- > pp._copy_from_user > 0.52 -0.1 0.46 ± 2% perf-profile.children.cycles- > pp.__fsnotify_parent > 1.06 -0.1 1.01 perf-profile.children.cycles- > pp.__inet_lookup_established > 0.41 -0.0 0.36 perf-profile.children.cycles- > pp.tcp_rate_check_app_limited > 0.52 ± 2% -0.0 0.48 ± 2% perf-profile.children.cycles- > pp.netperf_sendfile > 0.74 -0.0 0.70 perf-profile.children.cycles- > pp.__cond_resched > 0.48 -0.0 0.43 perf-profile.children.cycles- > pp.tcp_event_new_data_sent > 0.64 -0.0 0.60 perf-profile.children.cycles-pp.__fget_light > 0.97 -0.0 0.93 perf-profile.children.cycles-pp.__alloc_skb > 0.60 ± 3% -0.0 0.55 ± 3% perf-profile.children.cycles-pp.ip_rcv > 0.78 -0.0 0.74 perf-profile.children.cycles- > pp.tcp_stream_alloc_skb > 0.38 -0.0 0.34 ± 2% perf-profile.children.cycles- > pp.page_cache_pipe_buf_confirm > 0.59 ± 2% -0.0 0.55 ± 2% perf-profile.children.cycles- > pp.__entry_text_start > 0.23 ± 5% -0.0 0.20 ± 2% perf-profile.children.cycles-pp.xas_load > 0.48 -0.0 0.44 perf-profile.children.cycles-pp.sk_reset_timer > 0.42 -0.0 0.39 perf-profile.children.cycles- > pp.entry_SYSRETQ_unsafe_stack > 0.74 ± 2% -0.0 0.71 perf-profile.children.cycles-pp.__kfree_skb > 0.69 -0.0 0.65 perf-profile.children.cycles-pp.read_tsc > 0.45 -0.0 0.42 ± 2% perf-profile.children.cycles- > pp.current_time > 0.57 -0.0 0.54 perf-profile.children.cycles- > pp.kmem_cache_alloc_node > 0.40 ± 2% -0.0 0.38 ± 2% perf-profile.children.cycles- > pp.__virt_addr_valid > 0.81 -0.0 0.78 perf-profile.children.cycles- > pp.enqueue_task_fair > 0.43 -0.0 0.40 perf-profile.children.cycles-pp.__mod_timer > 0.38 -0.0 0.36 perf-profile.children.cycles- > pp.tcp_established_options > 0.21 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles- > pp.sockfd_lookup_light > 0.35 -0.0 0.32 ± 2% perf-profile.children.cycles- > pp.__put_user_8 > 0.30 ± 3% -0.0 0.27 perf-profile.children.cycles- > pp.aa_file_perm > 0.48 -0.0 0.46 ± 2% perf-profile.children.cycles- > pp.__tcp_send_ack > 0.49 ± 2% -0.0 0.47 perf-profile.children.cycles- > pp.kmem_cache_free > 0.28 ± 3% -0.0 0.26 ± 4% perf-profile.children.cycles- > pp.ip_rcv_finish_core > 0.11 ± 6% -0.0 0.09 ± 5% perf-profile.children.cycles-pp.xas_start > 0.24 -0.0 0.22 ± 3% perf-profile.children.cycles-pp.tcp_tso_segs > 0.25 -0.0 0.23 perf-profile.children.cycles- > pp.copy_page_to_iter > 0.30 -0.0 0.28 ± 2% perf-profile.children.cycles- > pp.__netif_receive_skb_core > 0.24 -0.0 0.22 ± 2% perf-profile.children.cycles-pp.sanity > 0.78 -0.0 0.76 perf-profile.children.cycles- > pp.page_cache_pipe_buf_release > 0.28 ± 3% -0.0 0.26 perf-profile.children.cycles- > pp.tcp_schedule_loss_probe > 0.27 -0.0 0.26 perf-profile.children.cycles-pp.rcu_all_qs > 0.30 -0.0 0.28 perf-profile.children.cycles- > pp.syscall_return_via_sysret > 0.23 -0.0 0.22 ± 2% perf-profile.children.cycles- > pp.set_next_entity > 0.16 ± 3% -0.0 0.15 ± 5% perf-profile.children.cycles- > pp.skb_release_head_state > 0.15 ± 2% -0.0 0.14 ± 2% perf-profile.children.cycles- > pp.folio_mark_accessed > 0.08 -0.0 0.07 ± 5% perf-profile.children.cycles-pp.aa_sk_perm > 0.20 ± 2% -0.0 0.18 ± 2% perf-profile.children.cycles- > pp._raw_spin_unlock_bh > 0.07 -0.0 0.06 perf-profile.children.cycles-pp.rb_next > 0.05 +0.0 0.06 perf-profile.children.cycles-pp.skb_push > 0.07 +0.0 0.08 perf-profile.children.cycles- > pp.cpuidle_governor_latency_req > 0.33 +0.0 0.34 perf-profile.children.cycles- > pp.prepare_task_switch > 0.07 +0.0 0.08 ± 5% perf-profile.children.cycles- > pp.switch_fpu_return > 0.11 ± 6% +0.0 0.12 ± 4% perf-profile.children.cycles- > pp.resched_curr > 0.14 ± 3% +0.0 0.15 ± 3% perf-profile.children.cycles- > pp.check_preempt_curr > 0.21 +0.0 0.23 ± 2% perf-profile.children.cycles-pp.ip_output > 0.49 ± 2% +0.0 0.51 ± 2% perf-profile.children.cycles- > pp._raw_spin_lock > 0.59 +0.0 0.62 perf-profile.children.cycles- > pp._raw_spin_lock_irqsave > 0.76 ± 3% +0.1 0.90 ± 4% perf-profile.children.cycles- > pp.mem_cgroup_uncharge_skmem > 0.31 ± 2% +0.2 0.47 ± 10% perf-profile.children.cycles- > pp.propagate_protected_usage > 84.35 +0.2 84.55 perf-profile.children.cycles- > pp.do_syscall_64 > 16.48 +0.2 16.68 perf-profile.children.cycles- > pp.__local_bh_enable_ip > 15.96 +0.2 16.20 perf-profile.children.cycles-pp.do_softirq > 15.84 +0.2 16.09 perf-profile.children.cycles-pp.__do_softirq > 15.20 +0.3 15.46 perf-profile.children.cycles- > pp.net_rx_action > 17.63 +0.3 17.89 perf-profile.children.cycles- > pp.__dev_queue_xmit > 18.00 +0.3 18.29 perf-profile.children.cycles- > pp.ip_finish_output2 > 18.93 +0.3 19.22 perf-profile.children.cycles- > pp.__ip_queue_xmit > 20.12 +0.3 20.43 perf-profile.children.cycles- > pp.__tcp_transmit_skb > 12.38 +0.3 12.69 perf-profile.children.cycles- > pp.tcp_write_xmit > 13.56 +0.3 13.87 perf-profile.children.cycles-pp.__napi_poll > 13.51 +0.3 13.83 perf-profile.children.cycles- > pp.process_backlog > 13.12 +0.3 13.44 perf-profile.children.cycles- > pp.__netif_receive_skb_one_core > 12.12 +0.4 12.51 perf-profile.children.cycles- > pp.ip_local_deliver_finish > 12.08 +0.4 12.47 perf-profile.children.cycles- > pp.ip_protocol_deliver_rcu > 11.84 +0.4 12.24 perf-profile.children.cycles-pp.tcp_v4_rcv > 3.87 +0.5 4.34 perf-profile.children.cycles-pp.tcp_ack > 2.89 +0.5 3.40 ± 2% perf-profile.children.cycles- > pp.tcp_clean_rtx_queue > 9.78 +0.5 10.31 perf-profile.children.cycles- > pp.__tcp_push_pending_frames > 1.54 ± 4% +0.7 2.26 ± 7% perf-profile.children.cycles- > pp.refill_stock > 1.26 ± 5% +0.7 1.99 ± 8% perf-profile.children.cycles- > pp.drain_stock > 1.24 ± 5% +0.7 1.96 ± 8% perf-profile.children.cycles- > pp.page_counter_uncharge > 17.03 +0.8 17.85 perf-profile.children.cycles- > pp.do_tcp_sendpages > 46.66 +0.9 47.56 perf-profile.children.cycles- > pp.do_splice_direct > 2.92 ± 2% +0.9 3.86 ± 3% perf-profile.children.cycles- > pp.__sk_mem_reduce_allocated > 46.08 +0.9 47.03 perf-profile.children.cycles- > pp.splice_direct_to_actor > 4.41 +1.0 5.43 ± 4% perf-profile.children.cycles- > pp.tcp_data_queue > 10.88 +1.3 12.18 perf-profile.children.cycles- > pp.tcp_build_frag > 16.59 +1.4 17.98 perf-profile.children.cycles- > pp.tcp_v4_do_rcv > 16.36 +1.4 17.77 perf-profile.children.cycles- > pp.tcp_rcv_established > 7.93 +1.5 9.40 perf-profile.children.cycles- > pp.tcp_wmem_schedule > 1.52 ± 4% +1.5 2.98 ± 8% perf-profile.children.cycles- > pp.page_counter_try_charge > 36.96 +1.5 38.45 perf-profile.children.cycles- > pp.generic_splice_sendpage > 37.07 +1.5 38.56 perf-profile.children.cycles- > pp.direct_splice_actor > 7.75 +1.5 9.24 perf-profile.children.cycles- > pp.__sk_mem_schedule > 7.59 +1.5 9.10 perf-profile.children.cycles- > pp.__sk_mem_raise_allocated > 36.59 +1.5 38.12 perf-profile.children.cycles- > pp.__splice_from_pipe > 11.95 +1.5 13.48 ± 2% perf-profile.children.cycles- > pp.release_sock > 10.33 +1.6 11.89 ± 2% perf-profile.children.cycles- > pp.__release_sock > 34.67 +1.7 36.32 perf-profile.children.cycles- > pp.pipe_to_sendpage > 34.34 +1.7 36.02 perf-profile.children.cycles- > pp.sock_sendpage > 34.15 +1.7 35.84 perf-profile.children.cycles- > pp.kernel_sendpage > 33.84 +1.7 35.56 perf-profile.children.cycles- > pp.inet_sendpage > 33.40 +1.8 35.16 perf-profile.children.cycles- > pp.tcp_sendpage > 3.31 ± 4% +2.6 5.93 ± 7% perf-profile.children.cycles- > pp.try_charge_memcg > 6.82 +3.0 9.82 ± 3% perf-profile.children.cycles- > pp.mem_cgroup_charge_skmem > 6.66 -0.5 6.15 perf-profile.self.cycles-pp.copyout > 2.88 -0.4 2.44 ± 2% perf-profile.self.cycles- > pp.__sk_mem_raise_allocated > 2.69 -0.2 2.50 perf-profile.self.cycles- > pp.filemap_get_read_batch > 2.14 ± 2% -0.2 1.95 ± 2% perf-profile.self.cycles- > pp.check_heap_object > 2.01 -0.1 1.88 perf-profile.self.cycles-pp.tcp_build_frag > 1.30 -0.1 1.22 perf-profile.self.cycles-pp.filemap_read > 1.04 -0.1 0.96 perf-profile.self.cycles-pp.do_sendfile > 0.70 -0.1 0.63 ± 2% perf-profile.self.cycles- > pp.__splice_from_pipe > 0.52 -0.1 0.46 ± 2% perf-profile.self.cycles- > pp.sendfile_tcp_stream > 0.75 -0.1 0.70 perf-profile.self.cycles-pp.do_tcp_sendpages > 0.55 ± 2% -0.1 0.50 ± 2% perf-profile.self.cycles- > pp._copy_from_user > 0.42 ± 4% -0.1 0.36 ± 2% perf-profile.self.cycles-pp.sendfile > 0.67 ± 3% -0.1 0.62 ± 2% perf-profile.self.cycles- > pp.apparmor_file_permission > 1.11 -0.0 1.06 perf-profile.self.cycles- > pp.copy_page_to_iter_pipe > 0.54 ± 2% -0.0 0.49 perf-profile.self.cycles- > pp.entry_SYSCALL_64_after_hwframe > 0.48 -0.0 0.43 ± 2% perf-profile.self.cycles- > pp.__fsnotify_parent > 0.80 ± 2% -0.0 0.75 perf-profile.self.cycles- > pp.__skb_datagram_iter > 0.81 -0.0 0.76 perf-profile.self.cycles-pp.tcp_write_xmit > 0.95 -0.0 0.91 perf-profile.self.cycles- > pp.__inet_lookup_established > 0.62 -0.0 0.58 perf-profile.self.cycles-pp.__fget_light > 0.36 -0.0 0.32 perf-profile.self.cycles- > pp.tcp_rate_check_app_limited > 0.34 -0.0 0.30 perf-profile.self.cycles-pp.inet_sendpage > 0.47 -0.0 0.43 ± 2% perf-profile.self.cycles-pp.netperf_sendfile > 0.49 ± 5% -0.0 0.45 perf-profile.self.cycles-pp.net_rx_action > 0.48 -0.0 0.44 perf-profile.self.cycles- > pp.atime_needs_update > 0.67 -0.0 0.63 perf-profile.self.cycles-pp.tcp_v4_rcv > 0.43 ± 3% -0.0 0.40 perf-profile.self.cycles-pp.do_syscall_64 > 0.41 -0.0 0.37 perf-profile.self.cycles- > pp.entry_SYSRETQ_unsafe_stack > 0.36 ± 2% -0.0 0.32 ± 2% perf-profile.self.cycles- > pp.page_cache_pipe_buf_confirm > 0.46 -0.0 0.42 perf-profile.self.cycles- > pp.__local_bh_enable_ip > 0.48 ± 2% -0.0 0.45 perf-profile.self.cycles-pp.tcp_sendpage > 0.45 -0.0 0.42 perf-profile.self.cycles-pp.tcp_current_mss > 0.31 ± 2% -0.0 0.28 perf-profile.self.cycles-pp.kernel_sendpage > 0.34 -0.0 0.31 ± 2% perf-profile.self.cycles-pp.__put_user_8 > 0.66 -0.0 0.63 perf-profile.self.cycles-pp.read_tsc > 0.40 -0.0 0.37 ± 2% perf-profile.self.cycles- > pp.__check_object_size > 0.33 -0.0 0.30 perf-profile.self.cycles- > pp.generic_splice_sendpage > 0.31 -0.0 0.28 ± 2% perf-profile.self.cycles-pp.tcp_send_mss > 0.66 -0.0 0.64 perf-profile.self.cycles-pp.tcp_ack > 0.28 ± 2% -0.0 0.25 ± 2% perf-profile.self.cycles- > pp.__sys_recvfrom > 0.44 -0.0 0.42 ± 2% perf-profile.self.cycles-pp.__cond_resched > 0.39 -0.0 0.36 perf-profile.self.cycles-pp._copy_to_iter > 0.34 ± 2% -0.0 0.32 ± 2% perf-profile.self.cycles- > pp.tcp_established_options > 0.24 ± 2% -0.0 0.21 ± 4% perf-profile.self.cycles- > pp.tcp_wmem_schedule > 0.48 ± 2% -0.0 0.46 perf-profile.self.cycles- > pp.kmem_cache_free > 0.33 -0.0 0.31 perf-profile.self.cycles-pp.pipe_to_sendpage > 0.11 ± 6% -0.0 0.09 perf-profile.self.cycles- > pp.check_stack_object > 0.36 -0.0 0.34 ± 3% perf-profile.self.cycles-pp.release_sock > 0.26 -0.0 0.24 ± 2% perf-profile.self.cycles- > pp.security_file_permission > 0.23 -0.0 0.21 ± 3% perf-profile.self.cycles-pp.tcp_tso_segs > 0.44 -0.0 0.42 perf-profile.self.cycles- > pp.kmem_cache_alloc_node > 0.31 -0.0 0.29 ± 2% perf-profile.self.cycles-pp.current_time > 0.20 ± 3% -0.0 0.18 ± 2% perf-profile.self.cycles- > pp.do_splice_direct > 0.25 ± 4% -0.0 0.23 ± 2% perf-profile.self.cycles-pp.aa_file_perm > 0.25 ± 3% -0.0 0.23 ± 2% perf-profile.self.cycles-pp.touch_atime > 0.19 -0.0 0.17 ± 2% perf-profile.self.cycles-pp.process_backlog > 0.11 ± 6% -0.0 0.09 ± 5% perf-profile.self.cycles- > pp.__get_task_ioprio > 0.21 -0.0 0.19 perf-profile.self.cycles-pp.sanity > 0.06 -0.0 0.04 ± 44% perf-profile.self.cycles-pp.aa_sk_perm > 0.33 ± 2% -0.0 0.31 ± 2% perf-profile.self.cycles- > pp.splice_direct_to_actor > 0.30 -0.0 0.28 perf-profile.self.cycles- > pp.syscall_return_via_sysret > 0.22 -0.0 0.20 ± 2% perf-profile.self.cycles- > pp.copy_page_to_iter > 0.16 ± 2% -0.0 0.14 ± 3% perf-profile.self.cycles- > pp.tcp_stream_alloc_skb > 0.11 ± 3% -0.0 0.10 ± 5% perf-profile.self.cycles- > pp.ip_protocol_deliver_rcu > 0.09 ± 6% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.xas_start > 0.15 ± 2% -0.0 0.14 ± 3% perf-profile.self.cycles- > pp.__sk_mem_schedule > 0.65 -0.0 0.63 perf-profile.self.cycles-pp.tcp_rcv_established > 0.25 -0.0 0.23 perf-profile.self.cycles-pp.__mod_timer > 0.15 -0.0 0.14 ± 3% perf-profile.self.cycles- > pp.tcp_tx_timestamp > 0.30 -0.0 0.28 ± 2% perf-profile.self.cycles- > pp.__netif_receive_skb_core > 0.75 -0.0 0.74 perf-profile.self.cycles- > pp.page_cache_pipe_buf_release > 0.18 ± 2% -0.0 0.17 ± 2% perf-profile.self.cycles- > pp.sock_sendpage > 0.13 ± 2% -0.0 0.12 perf-profile.self.cycles- > pp._raw_spin_unlock_bh > 0.12 ± 3% -0.0 0.11 perf-profile.self.cycles- > pp.folio_mark_accessed > 0.12 -0.0 0.11 ± 3% perf-profile.self.cycles- > pp.simple_copy_to_iter > 0.06 -0.0 0.05 perf-profile.self.cycles- > pp.splice_from_pipe_next > 0.11 -0.0 0.10 perf-profile.self.cycles- > pp.exit_to_user_mode_prepare > 0.25 +0.0 0.26 perf-profile.self.cycles-pp.__switch_to > 0.06 ± 8% +0.0 0.07 ± 6% perf-profile.self.cycles- > pp.switch_fpu_return > 0.44 ± 2% +0.0 0.46 perf-profile.self.cycles-pp._raw_spin_lock > 0.33 +0.0 0.36 ± 2% perf-profile.self.cycles-pp.__schedule > 0.58 +0.0 0.61 perf-profile.self.cycles- > pp._raw_spin_lock_irqsave > 0.34 ± 3% +0.0 0.38 perf-profile.self.cycles- > pp.__x64_sys_sendfile64 > 0.16 ± 8% +0.0 0.20 ± 2% perf-profile.self.cycles-pp.do_splice_to > 0.65 ± 2% +0.1 0.73 ± 2% perf-profile.self.cycles- > pp.__sk_mem_reduce_allocated > 0.71 ± 3% +0.1 0.84 ± 5% perf-profile.self.cycles- > pp.mem_cgroup_uncharge_skmem > 0.30 ± 2% +0.2 0.47 ± 10% perf-profile.self.cycles- > pp.propagate_protected_usage > 3.34 ± 3% +0.4 3.72 ± 5% perf-profile.self.cycles- > pp.mem_cgroup_charge_skmem > 1.08 ± 6% +0.7 1.74 ± 8% perf-profile.self.cycles- > pp.page_counter_uncharge > 1.72 ± 3% +1.2 2.87 ± 7% perf-profile.self.cycles- > pp.try_charge_memcg > 1.36 ± 5% +1.4 2.73 ± 8% perf-profile.self.cycles- > pp.page_counter_try_charge > > > > [2] > > ================================================================ > ========================= > cluster/compiler/cpufreq_governor/ip/kconfig/nr_threads/rootfs/runtime/s > end_size/tbox_group/test/testcase: > cs-localhost/gcc-11/performance/ipv4/x86_64-rhel-8.3/50%/debian-11.1- > x86_64-20220510.cgz/300s/10K/lkp-icl-2sp2/TCP_SENDFILE/netperf > > commit: > ed23734c23 ("Merge tag 'net-6.4-rc1' of > git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net") > 05d72a8bed ("net: Keep sk->sk_forward_alloc as a proper size") > > ed23734c23d2fc1e 05d72a8bedfacfc46f300ab38e0 > ---------------- --------------------------- > %stddev %change %stddev > \ | \ > 5.95e+09 -12.7% 5.193e+09 cpuidle..time > 3328 ± 22% +96.7% 6547 ± 21% numa- > vmstat.node2.nr_slab_reclaimable > 13.95 -2.0 11.93 mpstat.cpu.all.idle% > 2.69 +0.6 3.31 mpstat.cpu.all.usr% > 5106176 -6.6% 4769081 vmstat.system.cs > 2629481 -7.3% 2436543 vmstat.system.in > 11284480 ± 9% +23.7% 13957802 ± 11% meminfo.DirectMap2M > 1726173 ± 2% -17.6% 1422506 ± 2% meminfo.Mapped > 7247621 +11.2% 8061423 meminfo.Shmem > 13314 ± 22% +96.7% 26192 ± 21% numa- > meminfo.node2.KReclaimable > 13314 ± 22% +96.7% 26192 ± 21% numa- > meminfo.node2.SReclaimable > 71128 ± 5% +28.0% 91013 ± 8% numa-meminfo.node2.Slab > 15.26 -1.9 13.33 turbostat.C1% > 10.41 -15.8% 8.77 turbostat.CPU%c1 > 0.26 +11.5% 0.29 turbostat.IPC > 30.71 -3.2% 29.72 turbostat.RAMWatt > 7854382 ± 2% +10.3% 8664074 ± 2% > sched_debug.cfs_rq:/.min_vruntime.min > 708120 ± 2% -15.5% 598098 ± 3% > sched_debug.cfs_rq:/.min_vruntime.stddev > 708203 ± 2% -15.5% 598191 ± 3% > sched_debug.cfs_rq:/.spread0.stddev > 5317 ± 2% -11.2% 4722 ± 5% sched_debug.cpu.avg_idle.min > 10037310 ± 3% -15.9% 8440803 ± 2% > sched_debug.cpu.nr_switches.max > 1290083 ± 2% -22.0% 1006686 ± 3% > sched_debug.cpu.nr_switches.stddev > 23218 +29.4% 30043 netperf.Throughput_Mbps > 1485996 +29.4% 1922763 netperf.Throughput_total_Mbps > 160215 ± 3% +107.9% 333022 ± 15% > netperf.time.involuntary_context_switches > 5567 +2.5% 5707 netperf.time.percent_of_cpu_this_job_got > 16093 +1.2% 16286 netperf.time.system_time > 669.70 +34.0% 897.24 netperf.time.user_time > 35419 ± 3% +160.8% 92374 ± 5% > netperf.time.voluntary_context_switches > 5.442e+09 +29.4% 7.041e+09 netperf.workload > 2481590 +8.1% 2681600 proc-vmstat.nr_file_pages > 1892119 +10.6% 2092306 proc-vmstat.nr_inactive_anon > 431915 ± 2% -17.9% 354649 ± 2% proc-vmstat.nr_mapped > 3064 -4.5% 2927 proc-vmstat.nr_page_table_pages > 1813072 +11.0% 2013082 proc-vmstat.nr_shmem > 35384 +1.3% 35861 proc-vmstat.nr_slab_reclaimable > 1892119 +10.6% 2092306 proc-vmstat.nr_zone_inactive_anon > 491137 ± 2% -20.0% 393067 ± 17% proc- > vmstat.numa_hint_faults_local > 5593417 +10.7% 6193714 proc-vmstat.numa_hit > 5431644 +10.5% 6001135 proc-vmstat.numa_local > 44132 ± 3% +18.1% 52128 ± 6% proc-vmstat.pgactivate > 5733229 +9.9% 6302633 proc-vmstat.pgalloc_normal > 7.00 -22.1% 5.45 perf-stat.i.MPKI > 4.405e+10 +13.7% 5.007e+10 perf-stat.i.branch-instructions > 0.87 -0.1 0.78 perf-stat.i.branch-miss-rate% > 3.795e+08 +1.6% 3.854e+08 perf-stat.i.branch-misses > 6.39 -3.3 3.09 ± 7% perf-stat.i.cache-miss-rate% > 1.038e+08 ± 2% -57.7% 43877506 ± 7% perf-stat.i.cache-misses > 1.633e+09 -12.0% 1.438e+09 perf-stat.i.cache-references > 5163294 -6.8% 4814691 perf-stat.i.context-switches > 1.29 -10.0% 1.16 perf-stat.i.cpi > 3.016e+11 +1.8% 3.072e+11 perf-stat.i.cpu-cycles > 27516 ± 3% -34.8% 17931 perf-stat.i.cpu-migrations > 2930 ± 2% +153.5% 7428 ± 7% perf-stat.i.cycles-between-cache- > misses > 0.01 -0.0 0.01 ± 13% perf-stat.i.dTLB-load-miss-rate% > 7226907 -11.0% 6428694 ± 13% perf-stat.i.dTLB-load-misses > 6.872e+10 +13.4% 7.791e+10 perf-stat.i.dTLB-loads > 0.00 ± 3% -0.0 0.00 ± 2% perf-stat.i.dTLB-store-miss-rate% > 954320 ± 3% -33.0% 639153 ± 2% perf-stat.i.dTLB-store-misses > 3.753e+10 +12.5% 4.221e+10 perf-stat.i.dTLB-stores > 2.332e+11 +13.2% 2.639e+11 perf-stat.i.instructions > 0.78 +11.1% 0.86 perf-stat.i.ipc > 2.36 +1.8% 2.40 perf-stat.i.metric.GHz > 263.06 ± 2% -45.6% 143.14 ± 5% perf-stat.i.metric.K/sec > 1186 +13.0% 1340 perf-stat.i.metric.M/sec > 95.18 +2.5 97.70 perf-stat.i.node-load-miss-rate% > 15047143 ± 3% -50.7% 7421607 ± 7% perf-stat.i.node-load-misses > 736992 ± 4% -79.2% 153436 ± 5% perf-stat.i.node-loads > 76.94 -13.8 63.13 ± 5% perf-stat.i.node-store-miss-rate% > 8866276 -61.9% 3375324 ± 7% perf-stat.i.node-store-misses > 2808107 ± 7% -34.1% 1851536 ± 14% perf-stat.i.node-stores > 7.00 -22.2% 5.45 perf-stat.overall.MPKI > 0.86 -0.1 0.77 perf-stat.overall.branch-miss-rate% > 6.36 -3.3 3.05 ± 7% perf-stat.overall.cache-miss-rate% > 1.29 -10.0% 1.16 perf-stat.overall.cpi > 2907 ± 2% +142.1% 7040 ± 7% perf-stat.overall.cycles-between- > cache-misses > 0.01 -0.0 0.01 ± 13% perf-stat.overall.dTLB-load-miss-rate% > 0.00 ± 3% -0.0 0.00 ± 2% perf-stat.overall.dTLB-store-miss-rate% > 0.77 +11.1% 0.86 perf-stat.overall.ipc > 95.33 +2.6 97.97 perf-stat.overall.node-load-miss-rate% > 75.97 -11.3 64.69 ± 4% perf-stat.overall.node-store-miss-rate% > 12891 -12.6% 11262 perf-stat.overall.path-length > 4.39e+10 +13.7% 4.99e+10 perf-stat.ps.branch-instructions > 3.782e+08 +1.6% 3.841e+08 perf-stat.ps.branch-misses > 1.034e+08 ± 2% -57.7% 43735005 ± 7% perf-stat.ps.cache-misses > 1.627e+09 -11.9% 1.433e+09 perf-stat.ps.cache-references > 5145798 -6.8% 4798160 perf-stat.ps.context-switches > 3.006e+11 +1.8% 3.062e+11 perf-stat.ps.cpu-cycles > 27426 ± 3% -34.8% 17883 perf-stat.ps.cpu-migrations > 7190273 -11.0% 6397079 ± 13% perf-stat.ps.dTLB-load-misses > 6.849e+10 +13.4% 7.765e+10 perf-stat.ps.dTLB-loads > 950808 ± 3% -33.0% 637446 ± 2% perf-stat.ps.dTLB-store-misses > 3.741e+10 +12.5% 4.207e+10 perf-stat.ps.dTLB-stores > 2.324e+11 +13.2% 2.63e+11 perf-stat.ps.instructions > 14992384 ± 3% -50.7% 7391904 ± 7% perf-stat.ps.node-load-misses > 734606 ± 4% -79.2% 153010 ± 5% perf-stat.ps.node-loads > 8837267 -61.9% 3364441 ± 7% perf-stat.ps.node-store-misses > 2799494 ± 7% -34.1% 1845425 ± 14% perf-stat.ps.node-stores > 7.015e+13 +13.0% 7.93e+13 perf-stat.total.instructions > 7.88 -6.8 1.06 ± 2% perf-profile.calltrace.cycles- > pp.tcp_wmem_schedule.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.in > et_sendpage > 7.64 -6.8 0.84 perf-profile.calltrace.cycles- > pp.__sk_mem_schedule.tcp_wmem_schedule.tcp_build_frag.do_tcp_sendp > ages.tcp_sendpage > 7.45 -6.7 0.76 ± 2% perf-profile.calltrace.cycles- > pp.__sk_mem_raise_allocated.__sk_mem_schedule.tcp_wmem_schedule.tc > p_build_frag.do_tcp_sendpages > 10.74 -6.3 4.41 perf-profile.calltrace.cycles- > pp.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_s > endpage > 33.39 -6.1 27.33 perf-profile.calltrace.cycles- > pp.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_sendpage.pipe_to_s > endpage > 33.88 -6.0 27.93 perf-profile.calltrace.cycles- > pp.inet_sendpage.kernel_sendpage.sock_sendpage.pipe_to_sendpage.__spli > ce_from_pipe > 34.25 -5.9 28.39 perf-profile.calltrace.cycles- > pp.kernel_sendpage.sock_sendpage.pipe_to_sendpage.__splice_from_pipe. > generic_splice_sendpage > 34.43 -5.8 28.61 perf-profile.calltrace.cycles- > pp.sock_sendpage.pipe_to_sendpage.__splice_from_pipe.generic_splice_se > ndpage.direct_splice_actor > 34.75 -5.8 29.00 perf-profile.calltrace.cycles- > pp.pipe_to_sendpage.__splice_from_pipe.generic_splice_sendpage.direct_s > plice_actor.splice_direct_to_actor > 36.66 -5.3 31.34 perf-profile.calltrace.cycles- > pp.__splice_from_pipe.generic_splice_sendpage.direct_splice_actor.splice_d > irect_to_actor.do_splice_direct > 37.08 -5.2 31.85 perf-profile.calltrace.cycles- > pp.generic_splice_sendpage.direct_splice_actor.splice_direct_to_actor.do_sp > lice_direct.do_sendfile > 37.20 -5.2 32.00 perf-profile.calltrace.cycles- > pp.direct_splice_actor.splice_direct_to_actor.do_splice_direct.do_sendfile._ > _x64_sys_sendfile64 > 16.95 -5.1 11.89 perf-profile.calltrace.cycles- > pp.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_s > endpage > 8.23 -2.6 5.67 ± 2% perf-profile.calltrace.cycles- > pp.__release_sock.release_sock.tcp_sendpage.inet_sendpage.kernel_sendp > age > 46.36 -2.5 43.86 perf-profile.calltrace.cycles- > pp.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_sys_sendfile64 > .do_syscall_64 > 46.96 -2.4 44.58 perf-profile.calltrace.cycles- > pp.do_splice_direct.do_sendfile.__x64_sys_sendfile64.do_syscall_64.entry_S > YSCALL_64_after_hwframe > 9.59 -2.3 7.24 ± 2% perf-profile.calltrace.cycles- > pp.release_sock.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_sendp > age > 2.87 -2.1 0.76 perf-profile.calltrace.cycles- > pp.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protoc > ol_deliver_rcu > 51.58 -2.0 49.62 perf-profile.calltrace.cycles- > pp.entry_SYSCALL_64_after_hwframe.sendfile.sendfile_tcp_stream.main.__li > bc_start_main > 49.43 -1.9 47.48 perf-profile.calltrace.cycles- > pp.do_sendfile.__x64_sys_sendfile64.do_syscall_64.entry_SYSCALL_64_after > _hwframe.sendfile > 51.31 -1.9 49.37 perf-profile.calltrace.cycles- > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.sendfile.sendfile_tcp_st > ream.main > 6.07 -1.8 4.22 ± 2% perf-profile.calltrace.cycles- > pp.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_do_rcv.__releas > e_sock.release_sock > 6.04 -1.8 4.20 ± 2% perf-profile.calltrace.cycles- > pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_ > do_rcv.__release_sock > 52.41 -1.8 50.64 perf-profile.calltrace.cycles- > pp.sendfile.sendfile_tcp_stream.main.__libc_start_main > 50.66 -1.7 48.91 perf-profile.calltrace.cycles- > pp.__x64_sys_sendfile64.do_syscall_64.entry_SYSCALL_64_after_hwframe.s > endfile.sendfile_tcp_stream > 1.99 -1.5 0.48 ± 44% perf-profile.calltrace.cycles- > pp.tcp_v4_do_rcv.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg > 53.77 -1.5 52.28 perf-profile.calltrace.cycles- > pp.sendfile_tcp_stream.main.__libc_start_main > 1.88 ± 2% -1.5 0.42 ± 44% perf-profile.calltrace.cycles- > pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_recv > msg > 5.64 -1.5 4.19 ± 2% perf-profile.calltrace.cycles- > pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit._ > _tcp_push_pending_frames > 6.14 -1.4 4.71 ± 2% perf-profile.calltrace.cycles- > pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pendin > g_frames.tcp_rcv_established > 5.67 -1.4 4.24 ± 2% perf-profile.calltrace.cycles- > pp.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sen > dpage > 2.08 -1.4 0.68 ± 8% perf-profile.calltrace.cycles- > pp.__release_sock.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg > 5.66 -1.4 4.28 ± 2% perf-profile.calltrace.cycles- > pp.tcp_v4_do_rcv.__release_sock.release_sock.tcp_sendpage.inet_sendpage > 2.22 -1.4 0.84 ± 8% perf-profile.calltrace.cycles- > pp.release_sock.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom > 7.37 -1.3 6.07 ± 3% perf-profile.calltrace.cycles- > pp.__tcp_transmit_skb.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv > _established.tcp_v4_do_rcv > 12.84 -1.2 11.64 perf-profile.calltrace.cycles- > pp.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_ > enter_state.cpuidle_enter > 7.52 -1.1 6.41 ± 2% perf-profile.calltrace.cycles- > pp.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_s > kb.tcp_write_xmit > 11.36 -1.0 10.31 perf-profile.calltrace.cycles- > pp.start_secondary.secondary_startup_64_no_verify > 11.35 -1.0 10.30 perf-profile.calltrace.cycles- > pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify > 11.47 -1.0 10.43 perf-profile.calltrace.cycles- > pp.secondary_startup_64_no_verify > 11.32 -1.0 10.28 perf-profile.calltrace.cycles- > pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_ve > rify > 9.96 -0.9 9.02 perf-profile.calltrace.cycles- > pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_s > tartup_64_no_verify > 9.10 -0.9 8.24 perf-profile.calltrace.cycles- > pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondar > y > 9.03 -0.9 8.18 perf-profile.calltrace.cycles- > pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_ > entry > 8.78 -0.8 7.95 perf-profile.calltrace.cycles- > pp.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_i > dle > 1.03 -0.6 0.43 ± 44% perf-profile.calltrace.cycles- > pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_ > data_queue.tcp_rcv_established > 1.19 -0.6 0.59 ± 2% perf-profile.calltrace.cycles- > pp.sock_def_readable.tcp_data_queue.tcp_rcv_established.tcp_v4_do_rcv.t > cp_v4_rcv > 1.32 -0.6 0.75 ± 2% perf-profile.calltrace.cycles- > pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__release_sock.release_sock > 1.08 -0.5 0.54 perf-profile.calltrace.cycles- > pp.__wake_up_common_lock.sock_def_readable.tcp_data_queue.tcp_rcv_e > stablished.tcp_v4_do_rcv > 1.12 -0.5 0.59 ± 3% perf-profile.calltrace.cycles- > pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.__relea > se_sock > 2.46 -0.5 2.00 ± 8% perf-profile.calltrace.cycles- > pp.wait_woken.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg > 2.24 -0.4 1.80 ± 8% perf-profile.calltrace.cycles- > pp.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locked.tcp_rec > vmsg > 2.19 -0.4 1.75 ± 8% perf-profile.calltrace.cycles- > pp.schedule.schedule_timeout.wait_woken.sk_wait_data.tcp_recvmsg_locke > d > 2.08 -0.4 1.65 ± 7% perf-profile.calltrace.cycles- > pp.__schedule.schedule.schedule_timeout.wait_woken.sk_wait_data > 3.07 -0.4 2.65 perf-profile.calltrace.cycles- > pp.sk_wait_data.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvm > sg > 1.69 -0.4 1.32 ± 8% perf-profile.calltrace.cycles- > pp.tcp_clean_rtx_queue.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_ > rcv > 3.56 -0.3 3.27 perf-profile.calltrace.cycles- > pp.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter.cpuidle > _idle_call > 8.87 -0.3 8.62 perf-profile.calltrace.cycles- > pp.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip_local_deliver_finish > .__netif_receive_skb_one_core > 2.17 -0.2 1.96 ± 8% perf-profile.calltrace.cycles- > pp.tcp_ack.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_delive > r_rcu > 8.73 -0.2 8.51 perf-profile.calltrace.cycles- > pp.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_protocol_deliver_rcu.ip > _local_deliver_finish > 0.69 -0.2 0.53 ± 44% perf-profile.calltrace.cycles- > pp.dequeue_task_fair.__schedule.schedule.schedule_timeout.wait_woken > 0.60 -0.1 0.46 ± 44% perf-profile.calltrace.cycles- > pp.dequeue_entity.dequeue_task_fair.__schedule.schedule.schedule_timeo > ut > 2.34 -0.1 2.23 perf-profile.calltrace.cycles- > pp.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_h > alt.acpi_idle_enter.cpuidle_enter_state > 0.99 -0.1 0.92 perf-profile.calltrace.cycles- > pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_star > tup_64_no_verify > 1.78 -0.1 1.70 perf-profile.calltrace.cycles- > pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_cal > l_function_single.acpi_safe_halt.acpi_idle_enter > 0.93 -0.1 0.85 perf-profile.calltrace.cycles- > pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary > 0.60 -0.1 0.54 perf-profile.calltrace.cycles- > pp.menu_select.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary > 1.21 -0.1 1.16 perf-profile.calltrace.cycles- > pp.sched_ttwu_pending.__sysvec_call_function_single.sysvec_call_function_ > single.asm_sysvec_call_function_single.acpi_safe_halt > 0.95 -0.0 0.90 perf-profile.calltrace.cycles- > pp.ttwu_do_activate.sched_ttwu_pending.__sysvec_call_function_single.sys > vec_call_function_single.asm_sysvec_call_function_single > 0.69 -0.0 0.66 perf-profile.calltrace.cycles- > pp.napi_consume_skb.net_rx_action.__do_softirq.do_softirq.__local_bh_en > able_ip > 0.78 -0.0 0.75 perf-profile.calltrace.cycles- > pp.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending._ > _sysvec_call_function_single > 0.53 +0.0 0.55 perf-profile.calltrace.cycles- > pp.enqueue_entity.enqueue_task_fair.activate_task.ttwu_do_activate.sched > _ttwu_pending > 0.58 +0.0 0.61 ± 2% perf-profile.calltrace.cycles- > pp.__alloc_skb.tcp_stream_alloc_skb.tcp_build_frag.do_tcp_sendpages.tcp_ > sendpage > 0.79 +0.1 0.90 ± 2% perf-profile.calltrace.cycles- > pp.tcp_stream_alloc_skb.tcp_build_frag.do_tcp_sendpages.tcp_sendpage.in > et_sendpage > 0.77 +0.1 0.90 ± 3% perf-profile.calltrace.cycles- > pp.page_cache_pipe_buf_release.__splice_from_pipe.generic_splice_sendpa > ge.direct_splice_actor.splice_direct_to_actor > 1.04 +0.1 1.18 ± 2% perf-profile.calltrace.cycles- > pp._raw_spin_lock_bh.release_sock.tcp_sendpage.inet_sendpage.kernel_se > ndpage > 0.96 +0.2 1.12 ± 2% perf-profile.calltrace.cycles- > pp.tcp_current_mss.tcp_send_mss.do_tcp_sendpages.tcp_sendpage.inet_se > ndpage > 0.71 +0.2 0.89 perf-profile.calltrace.cycles- > pp.do_splice_to.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_s > ys_sendfile64 > 0.41 ± 50% +0.2 0.64 ± 2% perf-profile.calltrace.cycles- > pp._copy_from_user.__x64_sys_sendfile64.do_syscall_64.entry_SYSCALL_64 > _after_hwframe.sendfile > 0.41 ± 50% +0.2 0.65 perf-profile.calltrace.cycles- > pp.security_file_permission.do_sendfile.__x64_sys_sendfile64.do_syscall_64 > .entry_SYSCALL_64_after_hwframe > 1.32 +0.2 1.56 perf-profile.calltrace.cycles- > pp.tcp_send_mss.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_se > ndpage > 15.63 +0.3 15.90 perf-profile.calltrace.cycles- > pp.__do_softirq.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_fini > sh_output2 > 1.10 +0.3 1.37 ± 2% perf-profile.calltrace.cycles- > pp.tcp_write_xmit.__tcp_push_pending_frames.do_tcp_sendpages.tcp_send > page.inet_sendpage > 15.79 +0.3 16.06 perf-profile.calltrace.cycles- > pp.do_softirq.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2._ > _ip_queue_xmit > 1.18 +0.3 1.45 ± 2% perf-profile.calltrace.cycles- > pp.__tcp_push_pending_frames.do_tcp_sendpages.tcp_sendpage.inet_send > page.kernel_sendpage > 15.88 +0.3 16.16 perf-profile.calltrace.cycles- > pp.__local_bh_enable_ip.__dev_queue_xmit.ip_finish_output2.__ip_queue_ > xmit.__tcp_transmit_skb > 0.31 ± 81% +0.3 0.60 ± 2% perf-profile.calltrace.cycles- > pp.touch_atime.splice_direct_to_actor.do_splice_direct.do_sendfile.__x64_s > ys_sendfile64 > 1.29 +0.3 1.60 perf-profile.calltrace.cycles- > pp.copy_page_to_iter_pipe.filemap_read.generic_file_splice_read.splice_dir > ect_to_actor.do_splice_direct > 2.14 +0.3 2.48 ± 2% perf-profile.calltrace.cycles- > pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.d > o_tcp_sendpages > 2.23 +0.4 2.60 perf-profile.calltrace.cycles- > pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_write_xmit.do_tcp_sendpages. > tcp_sendpage > 2.42 +0.4 2.86 ± 2% perf-profile.calltrace.cycles- > pp.__tcp_transmit_skb.tcp_write_xmit.do_tcp_sendpages.tcp_sendpage.inet > _sendpage > 2.66 +0.5 3.20 ± 2% perf-profile.calltrace.cycles- > pp.tcp_write_xmit.do_tcp_sendpages.tcp_sendpage.inet_sendpage.kernel_s > endpage > 0.00 +0.5 0.54 ± 2% perf-profile.calltrace.cycles- > pp.__fget_light.do_sendfile.__x64_sys_sendfile64.do_syscall_64.entry_SYSC > ALL_64_after_hwframe > 0.00 +0.6 0.56 ± 2% perf-profile.calltrace.cycles- > pp.__entry_text_start.sendfile.sendfile_tcp_stream.main.__libc_start_main > 4.35 +0.6 4.96 ± 2% perf-profile.calltrace.cycles- > pp.native_queued_spin_lock_slowpath._raw_spin_lock_bh.lock_sock_neste > d.tcp_sendpage.inet_sendpage > 0.00 +0.7 0.74 ± 3% perf-profile.calltrace.cycles- > pp.try_to_wake_up.__wake_up_common.__wake_up_common_lock.sock_d > ef_readable.tcp_rcv_established > 5.15 +0.8 5.93 ± 2% perf-profile.calltrace.cycles- > pp._raw_spin_lock_bh.lock_sock_nested.tcp_sendpage.inet_sendpage.kerne > l_sendpage > 0.00 +0.8 0.84 ± 3% perf-profile.calltrace.cycles- > pp.__wake_up_common.__wake_up_common_lock.sock_def_readable.tcp_ > rcv_established.tcp_v4_do_rcv > 2.47 +0.8 3.31 perf-profile.calltrace.cycles- > pp.tcp_write_xmit.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_ > do_rcv.tcp_v4_rcv > 2.49 +0.8 3.34 perf-profile.calltrace.cycles- > pp.__tcp_push_pending_frames.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_ > rcv.ip_protocol_deliver_rcu > 5.49 +0.9 6.34 ± 2% perf-profile.calltrace.cycles- > pp.lock_sock_nested.tcp_sendpage.inet_sendpage.kernel_sendpage.sock_se > ndpage > 0.00 +0.9 0.88 ± 3% perf-profile.calltrace.cycles- > pp.__wake_up_common_lock.sock_def_readable.tcp_rcv_established.tcp_v > 4_do_rcv.tcp_v4_rcv > 2.61 ± 2% +0.9 3.53 ± 3% perf-profile.calltrace.cycles- > pp.check_heap_object.__check_object_size.simple_copy_to_iter.__skb_data > gram_iter.skb_copy_datagram_iter > 0.00 +0.9 0.94 ± 2% perf-profile.calltrace.cycles- > pp.sock_def_readable.tcp_rcv_established.tcp_v4_do_rcv.tcp_v4_rcv.ip_pro > tocol_deliver_rcu > 2.98 +1.0 4.00 ± 2% perf-profile.calltrace.cycles- > pp.__check_object_size.simple_copy_to_iter.__skb_datagram_iter.skb_copy > _datagram_iter.tcp_recvmsg_locked > 2.91 +1.0 3.94 perf-profile.calltrace.cycles- > pp.filemap_get_read_batch.filemap_get_pages.filemap_read.generic_file_sp > lice_read.splice_direct_to_actor > 10.13 +1.0 11.17 perf-profile.calltrace.cycles- > pp.__tcp_transmit_skb.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_ > recvmsg > 3.14 +1.1 4.21 perf-profile.calltrace.cycles- > pp.filemap_get_pages.filemap_read.generic_file_splice_read.splice_direct_t > o_actor.do_splice_direct > 3.24 +1.1 4.32 ± 2% perf-profile.calltrace.cycles- > pp.simple_copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_r > ecvmsg_locked.tcp_recvmsg > 10.38 +1.3 11.66 perf-profile.calltrace.cycles- > pp.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg_locked.tcp_recvmsg.i > net_recvmsg > 10.07 +1.3 11.41 perf-profile.calltrace.cycles- > pp.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_skb.tcp_recvmsg_loc > ked.tcp_recvmsg > 9.94 +1.3 11.28 perf-profile.calltrace.cycles- > pp.__dev_queue_xmit.ip_finish_output2.__ip_queue_xmit.__tcp_transmit_s > kb.tcp_recvmsg_locked > 6.53 +1.6 8.18 perf-profile.calltrace.cycles- > pp.copyout._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp > _recvmsg_locked > 7.02 +1.8 8.79 perf-profile.calltrace.cycles- > pp._copy_to_iter.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvms > g_locked.tcp_recvmsg > 31.73 +1.9 33.63 perf-profile.calltrace.cycles- > pp.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recv > from > 31.85 +1.9 33.77 perf-profile.calltrace.cycles- > pp.inet_recvmsg.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_sysc > all_64 > 6.52 +1.9 8.44 perf-profile.calltrace.cycles- > pp.filemap_read.generic_file_splice_read.splice_direct_to_actor.do_splice_d > irect.do_sendfile > 32.06 +1.9 34.00 perf-profile.calltrace.cycles- > pp.sock_recvmsg.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_ > SYSCALL_64_after_hwframe > 32.54 +2.0 34.54 perf-profile.calltrace.cycles- > pp.__sys_recvfrom.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_af > ter_hwframe.recv > 32.63 +2.0 34.64 perf-profile.calltrace.cycles- > pp.__x64_sys_recvfrom.do_syscall_64.entry_SYSCALL_64_after_hwframe.rec > v.process_requests > 33.81 +2.0 35.82 perf-profile.calltrace.cycles- > pp.recv.process_requests.spawn_child.accept_connection.accept_connectio > ns > 32.95 +2.0 34.96 perf-profile.calltrace.cycles- > pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.recv.process_requests.s > pawn_child > 33.11 +2.0 35.14 perf-profile.calltrace.cycles- > pp.entry_SYSCALL_64_after_hwframe.recv.process_requests.spawn_child.ac > cept_connection > 7.44 +2.1 9.57 perf-profile.calltrace.cycles- > pp.generic_file_splice_read.splice_direct_to_actor.do_splice_direct.do_sendf > ile.__x64_sys_sendfile64 > 11.23 +3.1 14.38 perf-profile.calltrace.cycles- > pp.__skb_datagram_iter.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_r > ecvmsg.inet_recvmsg > 11.30 +3.2 14.48 perf-profile.calltrace.cycles- > pp.skb_copy_datagram_iter.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg. > sock_recvmsg > 29.26 +3.2 32.47 perf-profile.calltrace.cycles- > pp.tcp_recvmsg_locked.tcp_recvmsg.inet_recvmsg.sock_recvmsg.__sys_recvf > rom > 7.77 -6.8 0.94 perf-profile.children.cycles- > pp.__sk_mem_schedule > 7.95 -6.8 1.12 perf-profile.children.cycles- > pp.tcp_wmem_schedule > 7.62 -6.7 0.88 perf-profile.children.cycles- > pp.__sk_mem_raise_allocated > 10.92 -6.3 4.63 perf-profile.children.cycles-pp.tcp_build_frag > 6.86 ± 2% -6.2 0.62 ± 2% perf-profile.children.cycles- > pp.mem_cgroup_charge_skmem > 33.63 -5.9 27.72 perf-profile.children.cycles-pp.tcp_sendpage > 34.07 -5.8 28.26 perf-profile.children.cycles- > pp.inet_sendpage > 34.39 -5.7 28.65 perf-profile.children.cycles- > pp.kernel_sendpage > 34.58 -5.7 28.88 perf-profile.children.cycles- > pp.sock_sendpage > 34.90 -5.6 29.28 perf-profile.children.cycles- > pp.pipe_to_sendpage > 36.86 -5.2 31.69 perf-profile.children.cycles- > pp.__splice_from_pipe > 37.23 -5.1 32.14 perf-profile.children.cycles- > pp.generic_splice_sendpage > 37.33 -5.1 32.26 perf-profile.children.cycles- > pp.direct_splice_actor > 17.14 -4.9 12.22 perf-profile.children.cycles- > pp.do_tcp_sendpages > 10.36 -3.9 6.49 perf-profile.children.cycles- > pp.__release_sock > 4.40 -3.6 0.78 perf-profile.children.cycles- > pp.tcp_data_queue > 11.99 -3.6 8.40 perf-profile.children.cycles-pp.release_sock > 3.34 ± 4% -3.1 0.26 ± 2% perf-profile.children.cycles- > pp.try_charge_memcg > 16.59 -3.0 13.62 perf-profile.children.cycles- > pp.tcp_v4_do_rcv > 16.37 -2.9 13.46 perf-profile.children.cycles- > pp.tcp_rcv_established > 46.40 -2.5 43.91 perf-profile.children.cycles- > pp.splice_direct_to_actor > 2.93 -2.5 0.46 perf-profile.children.cycles- > pp.__sk_mem_reduce_allocated > 46.99 -2.4 44.62 perf-profile.children.cycles- > pp.do_splice_direct > 49.52 -1.9 47.59 perf-profile.children.cycles-pp.do_sendfile > 50.71 -1.7 48.97 perf-profile.children.cycles- > pp.__x64_sys_sendfile64 > 1.54 ± 5% -1.5 0.06 ± 6% perf-profile.children.cycles- > pp.page_counter_try_charge > 1.56 ± 3% -1.4 0.16 ± 2% perf-profile.children.cycles- > pp.refill_stock > 1.29 ± 4% -1.2 0.06 perf-profile.children.cycles- > pp.drain_stock > 1.26 ± 4% -1.2 0.05 perf-profile.children.cycles- > pp.page_counter_uncharge > 52.89 -1.2 51.68 perf-profile.children.cycles-pp.sendfile > 11.36 -1.0 10.31 perf-profile.children.cycles- > pp.start_secondary > 11.47 -1.0 10.43 perf-profile.children.cycles- > pp.secondary_startup_64_no_verify > 11.47 -1.0 10.43 perf-profile.children.cycles- > pp.cpu_startup_entry > 11.45 -1.0 10.41 perf-profile.children.cycles-pp.do_idle > 53.93 -1.0 52.92 perf-profile.children.cycles- > pp.sendfile_tcp_stream > 3.85 -0.9 2.91 perf-profile.children.cycles-pp.tcp_ack > 10.07 -0.9 9.13 perf-profile.children.cycles- > pp.cpuidle_idle_call > 9.19 -0.9 8.34 perf-profile.children.cycles-pp.cpuidle_enter > 9.13 -0.8 8.28 perf-profile.children.cycles- > pp.cpuidle_enter_state > 2.87 -0.8 2.03 perf-profile.children.cycles- > pp.tcp_clean_rtx_queue > 8.84 -0.8 8.02 perf-profile.children.cycles-pp.acpi_safe_halt > 8.87 -0.8 8.04 perf-profile.children.cycles-pp.acpi_idle_enter > 7.77 -0.7 7.11 perf-profile.children.cycles- > pp.asm_sysvec_call_function_single > 9.79 -0.6 9.14 perf-profile.children.cycles- > pp.__tcp_push_pending_frames > 0.75 ± 4% -0.6 0.14 ± 3% perf-profile.children.cycles- > pp.mem_cgroup_uncharge_skmem > 3.07 -0.4 2.63 perf-profile.children.cycles-pp.__schedule > 3.09 -0.4 2.67 perf-profile.children.cycles-pp.sk_wait_data > 2.47 -0.4 2.08 perf-profile.children.cycles-pp.wait_woken > 2.25 -0.4 1.87 perf-profile.children.cycles- > pp.schedule_timeout > 2.20 -0.4 1.83 perf-profile.children.cycles-pp.schedule > 1.10 ± 2% -0.3 0.82 ± 3% perf-profile.children.cycles- > pp.pick_next_task_fair > 0.73 ± 4% -0.2 0.48 ± 6% perf-profile.children.cycles- > pp.newidle_balance > 0.28 ± 12% -0.2 0.09 ± 5% perf-profile.children.cycles- > pp.cgroup_rstat_updated > 2.39 -0.1 2.28 perf-profile.children.cycles- > pp.sysvec_call_function_single > 0.30 ± 4% -0.1 0.20 ± 5% perf-profile.children.cycles- > pp.load_balance > 1.01 -0.1 0.93 perf-profile.children.cycles-pp.schedule_idle > 1.68 -0.1 1.60 perf-profile.children.cycles- > pp.sock_def_readable > 1.82 -0.1 1.74 perf-profile.children.cycles- > pp.__sysvec_call_function_single > 0.22 ± 5% -0.1 0.14 ± 5% perf-profile.children.cycles- > pp.find_busiest_group > 1.51 -0.1 1.44 perf-profile.children.cycles- > pp.__wake_up_common_lock > 0.20 ± 6% -0.1 0.13 ± 5% perf-profile.children.cycles- > pp.update_sd_lb_stats > 1.27 -0.1 1.21 perf-profile.children.cycles- > pp.try_to_wake_up > 1.43 -0.1 1.37 perf-profile.children.cycles- > pp.__wake_up_common > 0.14 ± 3% -0.1 0.09 ± 7% perf-profile.children.cycles- > pp.update_blocked_averages > 0.61 -0.1 0.56 perf-profile.children.cycles-pp.menu_select > 0.70 -0.1 0.64 perf-profile.children.cycles- > pp.dequeue_task_fair > 0.15 ± 4% -0.1 0.10 ± 5% perf-profile.children.cycles- > pp.update_sg_lb_stats > 0.63 -0.0 0.58 perf-profile.children.cycles- > pp.dequeue_entity > 1.25 -0.0 1.20 perf-profile.children.cycles- > pp.sched_ttwu_pending > 0.24 ± 2% -0.0 0.19 ± 3% perf-profile.children.cycles- > pp.tcp_check_space > 0.98 -0.0 0.94 perf-profile.children.cycles- > pp.ttwu_do_activate > 0.06 -0.0 0.02 ± 99% perf-profile.children.cycles- > pp.irqentry_exit > 0.30 -0.0 0.27 ± 2% perf-profile.children.cycles- > pp.native_irq_return_iret > 0.52 -0.0 0.48 perf-profile.children.cycles- > pp.ttwu_queue_wakelist > 0.43 -0.0 0.40 perf-profile.children.cycles- > pp.native_sched_clock > 0.08 ± 5% -0.0 0.06 ± 9% perf-profile.children.cycles- > pp.raw_spin_rq_lock_nested > 0.22 -0.0 0.20 ± 2% perf-profile.children.cycles- > pp.__switch_to_asm > 0.48 -0.0 0.45 perf-profile.children.cycles- > pp.sched_clock_cpu > 0.27 -0.0 0.24 perf-profile.children.cycles-pp.__switch_to > 0.21 ± 2% -0.0 0.18 ± 4% perf-profile.children.cycles- > pp.___perf_sw_event > 0.11 ± 3% -0.0 0.09 perf-profile.children.cycles- > pp.ct_kernel_exit_state > 0.19 ± 2% -0.0 0.17 ± 2% perf-profile.children.cycles- > pp.native_apic_msr_eoi_write > 0.29 -0.0 0.27 perf-profile.children.cycles-pp.update_curr > 0.06 -0.0 0.04 ± 44% perf-profile.children.cycles- > pp.update_irq_load_avg > 0.14 ± 2% -0.0 0.12 perf-profile.children.cycles- > pp.update_rq_clock_task > 0.11 ± 4% -0.0 0.09 ± 7% perf-profile.children.cycles- > pp.resched_curr > 0.13 ± 4% -0.0 0.11 ± 4% perf-profile.children.cycles- > pp.check_preempt_curr > 0.17 ± 2% -0.0 0.15 ± 2% perf-profile.children.cycles- > pp.__x2apic_send_IPI_dest > 0.17 ± 2% -0.0 0.15 ± 3% perf-profile.children.cycles- > pp.__update_load_avg_se > 0.12 ± 4% -0.0 0.10 ± 3% perf-profile.children.cycles- > pp.finish_task_switch > 0.25 -0.0 0.23 ± 2% perf-profile.children.cycles- > pp.set_next_entity > 0.09 -0.0 0.08 perf-profile.children.cycles- > pp.__wrgsbase_inactive > 0.06 -0.0 0.05 perf-profile.children.cycles-pp.ct_idle_exit > 0.10 +0.0 0.11 perf-profile.children.cycles- > pp.tcp_chrono_stop > 0.07 ± 5% +0.0 0.08 perf-profile.children.cycles-pp.rb_next > 0.05 ± 7% +0.0 0.06 ± 7% perf-profile.children.cycles- > pp.__fdget > 0.08 ± 5% +0.0 0.09 ± 4% perf-profile.children.cycles- > pp.tcp_rearm_rto > 0.06 ± 8% +0.0 0.07 perf-profile.children.cycles-pp.rb_first > 1.08 +0.0 1.10 perf-profile.children.cycles- > pp.dev_hard_start_xmit > 0.11 ± 4% +0.0 0.13 ± 2% perf-profile.children.cycles- > pp.inet_ehashfn > 0.07 ± 6% +0.0 0.09 ± 4% perf-profile.children.cycles- > pp.demo_interval_tick > 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.children.cycles- > pp.netif_skb_features > 0.28 ± 2% +0.0 0.30 perf-profile.children.cycles- > pp.ip_local_out > 0.09 +0.0 0.10 ± 4% perf-profile.children.cycles- > pp.tcp_queue_rcv > 0.05 +0.0 0.06 ± 7% perf-profile.children.cycles- > pp.__tcp_ack_snd_check > 0.16 ± 3% +0.0 0.18 ± 2% perf-profile.children.cycles- > pp.ip_send_check > 0.07 ± 7% +0.0 0.08 ± 4% perf-profile.children.cycles- > pp.tcp_rtt_estimator > 0.06 ± 8% +0.0 0.07 ± 5% perf-profile.children.cycles- > pp.iov_iter_pipe > 0.24 ± 3% +0.0 0.26 perf-profile.children.cycles- > pp.tcp_rcv_space_adjust > 0.25 +0.0 0.26 perf-profile.children.cycles- > pp.__update_load_avg_cfs_rq > 0.15 +0.0 0.17 ± 5% perf-profile.children.cycles- > pp.ipv4_dst_check > 0.06 ± 7% +0.0 0.08 ± 5% perf-profile.children.cycles- > pp.splice_from_pipe_next > 0.12 ± 3% +0.0 0.14 ± 3% perf-profile.children.cycles- > pp.tcp_update_skb_after_send > 0.60 +0.0 0.62 perf-profile.children.cycles- > pp._raw_spin_lock_irqsave > 0.08 +0.0 0.10 perf-profile.children.cycles- > pp.__list_add_valid > 0.11 ± 3% +0.0 0.13 ± 2% perf-profile.children.cycles- > pp.__get_task_ioprio > 0.36 +0.0 0.38 perf-profile.children.cycles- > pp.enqueue_to_backlog > 0.11 +0.0 0.13 ± 2% perf-profile.children.cycles- > pp.syscall_enter_from_user_mode > 0.12 +0.0 0.14 ± 2% perf-profile.children.cycles-pp.tcp_push > 0.21 +0.0 0.23 ± 3% perf-profile.children.cycles- > pp.exit_to_user_mode_prepare > 0.10 ± 5% +0.0 0.12 ± 5% perf-profile.children.cycles- > pp.xas_start > 0.10 +0.0 0.12 ± 3% perf-profile.children.cycles- > pp.tcp_update_pacing_rate > 0.06 +0.0 0.08 ± 5% perf-profile.children.cycles- > pp.tcp_event_data_recv > 0.12 ± 4% +0.0 0.14 perf-profile.children.cycles- > pp.tcp_downgrade_zcopy_pure > 0.17 ± 3% +0.0 0.20 ± 2% perf-profile.children.cycles- > pp.syscall_exit_to_user_mode_prepare > 0.20 ± 2% +0.0 0.22 ± 4% perf-profile.children.cycles- > pp.sockfd_lookup_light > 0.10 ± 4% +0.0 0.13 perf-profile.children.cycles- > pp.is_vmalloc_addr > 0.10 ± 4% +0.0 0.13 ± 6% perf-profile.children.cycles- > pp.make_vfsgid > 0.10 ± 3% +0.0 0.13 ± 2% perf-profile.children.cycles- > pp.make_vfsuid > 0.39 +0.0 0.42 perf-profile.children.cycles- > pp.netif_rx_internal > 0.28 ± 2% +0.0 0.30 ± 3% perf-profile.children.cycles- > pp.recv_tcp_stream > 0.13 +0.0 0.16 ± 4% perf-profile.children.cycles- > pp.check_stack_object > 0.13 ± 3% +0.0 0.16 ± 2% perf-profile.children.cycles- > pp.tcp_release_cb > 0.12 ± 3% +0.0 0.15 ± 2% perf-profile.children.cycles- > pp.demo_stream_interval > 0.26 ± 2% +0.0 0.29 ± 2% perf-profile.children.cycles- > pp.tcp_add_backlog > 0.11 ± 3% +0.0 0.14 ± 2% perf-profile.children.cycles- > pp.tcp_ack_update_rtt > 0.21 +0.0 0.24 ± 2% perf-profile.children.cycles- > pp.ip_rcv_core > 0.18 ± 2% +0.0 0.21 ± 3% perf-profile.children.cycles- > pp.__sk_dst_check > 0.07 +0.0 0.10 ± 3% perf-profile.children.cycles- > pp.__tcp_cleanup_rbuf > 0.41 +0.0 0.44 perf-profile.children.cycles-pp.__netif_rx > 0.17 ± 2% +0.0 0.20 perf-profile.children.cycles- > pp.__tcp_select_window > 0.14 ± 3% +0.0 0.17 ± 3% perf-profile.children.cycles- > pp.tcp_mtu_probe > 0.34 +0.0 0.37 perf-profile.children.cycles- > pp.kmalloc_reserve > 0.09 ± 4% +0.0 0.12 ± 4% perf-profile.children.cycles- > pp.lock_timer_base > 0.17 ± 2% +0.0 0.21 perf-profile.children.cycles- > pp.tcp_tx_timestamp > 0.15 +0.0 0.19 ± 3% perf-profile.children.cycles- > pp.folio_mark_accessed > 0.20 ± 2% +0.0 0.24 perf-profile.children.cycles- > pp._raw_spin_unlock_bh > 0.40 +0.0 0.44 perf-profile.children.cycles- > pp.tcp_mstamp_refresh > 0.15 ± 3% +0.0 0.20 ± 5% perf-profile.children.cycles- > pp.inet_send_prepare > 0.37 +0.0 0.42 perf-profile.children.cycles-pp.__skb_clone > 0.14 ± 2% +0.0 0.18 ± 5% perf-profile.children.cycles- > pp.ktime_get_coarse_real_ts64 > 0.27 +0.0 0.32 perf-profile.children.cycles- > pp.validate_xmit_skb > 0.18 +0.0 0.22 ± 2% perf-profile.children.cycles- > pp.fsnotify_perm > 0.17 ± 3% +0.0 0.22 ± 2% perf-profile.children.cycles- > pp.skb_clone > 0.19 ± 2% +0.0 0.24 ± 2% perf-profile.children.cycles- > pp.rw_verify_area > 0.23 ± 2% +0.1 0.28 ± 2% perf-profile.children.cycles- > pp.xas_load > 0.00 +0.1 0.05 ± 8% perf-profile.children.cycles- > pp.tcp_rbtree_insert > 0.28 ± 2% +0.1 0.34 perf-profile.children.cycles- > pp.tcp_schedule_loss_probe > 0.24 +0.1 0.30 perf-profile.children.cycles-pp.sanity > 0.32 ± 2% +0.1 0.38 ± 2% perf-profile.children.cycles- > pp.dst_release > 0.58 +0.1 0.65 perf-profile.children.cycles- > pp.kmem_cache_alloc_node > 0.31 +0.1 0.37 perf-profile.children.cycles- > pp.syscall_return_via_sysret > 0.24 +0.1 0.31 perf-profile.children.cycles-pp.tcp_tso_segs > 0.25 +0.1 0.32 perf-profile.children.cycles- > pp.copy_page_to_iter > 0.48 +0.1 0.55 perf-profile.children.cycles- > pp._raw_spin_lock > 0.28 +0.1 0.35 perf-profile.children.cycles-pp.rcu_all_qs > 0.32 ± 4% +0.1 0.39 ± 2% perf-profile.children.cycles- > pp.sock_put > 0.50 +0.1 0.57 perf-profile.children.cycles- > pp.kmem_cache_free > 0.34 ± 2% +0.1 0.42 perf-profile.children.cycles- > pp.__put_user_8 > 0.29 ± 2% +0.1 0.37 ± 2% perf-profile.children.cycles- > pp.aa_file_perm > 0.49 +0.1 0.57 perf-profile.children.cycles- > pp.syscall_exit_to_user_mode > 0.69 +0.1 0.77 perf-profile.children.cycles-pp.read_tsc > 0.16 ± 4% +0.1 0.25 ± 4% perf-profile.children.cycles- > pp.skb_release_head_state > 0.38 +0.1 0.47 perf-profile.children.cycles- > pp.tcp_established_options > 0.41 +0.1 0.50 ± 3% perf-profile.children.cycles- > pp.__virt_addr_valid > 0.48 +0.1 0.58 perf-profile.children.cycles- > pp.__tcp_send_ack > 0.42 +0.1 0.52 perf-profile.children.cycles- > pp.entry_SYSRETQ_unsafe_stack > 0.99 +0.1 1.09 perf-profile.children.cycles-pp.__alloc_skb > 0.52 +0.1 0.64 perf-profile.children.cycles- > pp.netperf_sendfile > 0.43 +0.1 0.55 perf-profile.children.cycles-pp.__mod_timer > 0.46 +0.1 0.58 perf-profile.children.cycles- > pp.tcp_event_new_data_sent > 0.80 +0.1 0.92 perf-profile.children.cycles- > pp.tcp_stream_alloc_skb > 0.47 +0.1 0.60 perf-profile.children.cycles-pp.sk_reset_timer > 0.46 ± 2% +0.1 0.58 ± 2% perf-profile.children.cycles- > pp.current_time > 0.51 +0.1 0.64 perf-profile.children.cycles- > pp.__fsnotify_parent > 0.59 +0.1 0.73 perf-profile.children.cycles- > pp.__entry_text_start > 0.54 +0.1 0.68 perf-profile.children.cycles- > pp._copy_from_user > 0.41 +0.1 0.54 perf-profile.children.cycles- > pp.tcp_rate_check_app_limited > 0.39 ± 2% +0.1 0.52 perf-profile.children.cycles- > pp.page_cache_pipe_buf_confirm > 0.62 +0.1 0.76 perf-profile.children.cycles-pp.__fget_light > 0.79 +0.1 0.94 ± 2% perf-profile.children.cycles- > pp.page_cache_pipe_buf_release > 1.02 +0.2 1.19 ± 4% perf-profile.children.cycles-pp.ktime_get > 0.97 +0.2 1.14 perf-profile.children.cycles- > pp.napi_consume_skb > 0.78 +0.2 0.95 perf-profile.children.cycles- > pp.__cond_resched > 1.14 +0.2 1.32 perf-profile.children.cycles- > pp.tcp_current_mss > 0.74 +0.2 0.93 perf-profile.children.cycles-pp.do_splice_to > 0.76 +0.2 0.96 perf-profile.children.cycles-pp.__kfree_skb > 0.94 ± 2% +0.3 1.19 perf-profile.children.cycles- > pp.apparmor_file_permission > 1.38 +0.3 1.65 perf-profile.children.cycles-pp.tcp_send_mss > 1.09 ± 2% +0.3 1.36 perf-profile.children.cycles- > pp.atime_needs_update > 1.44 +0.3 1.72 perf-profile.children.cycles- > pp.skb_release_data > 15.09 +0.3 15.40 perf-profile.children.cycles- > pp.net_rx_action > 1.20 +0.3 1.51 perf-profile.children.cycles- > pp.security_file_permission > 1.34 +0.3 1.67 perf-profile.children.cycles-pp.touch_atime > 1.35 +0.3 1.69 perf-profile.children.cycles- > pp.copy_page_to_iter_pipe > 15.73 +0.3 16.08 perf-profile.children.cycles-pp.__do_softirq > 17.54 +0.4 17.89 perf-profile.children.cycles- > pp.__dev_queue_xmit > 15.84 +0.4 16.20 perf-profile.children.cycles-pp.do_softirq > 17.91 +0.4 18.27 perf-profile.children.cycles- > pp.ip_finish_output2 > 18.82 +0.4 19.20 perf-profile.children.cycles- > pp.__ip_queue_xmit > 20.03 +0.4 20.41 perf-profile.children.cycles- > pp.__tcp_transmit_skb > 84.38 +0.4 84.81 perf-profile.children.cycles- > pp.do_syscall_64 > 16.35 +0.5 16.81 perf-profile.children.cycles- > pp.__local_bh_enable_ip > 84.91 +0.5 85.42 perf-profile.children.cycles- > pp.entry_SYSCALL_64_after_hwframe > 4.52 +0.7 5.21 perf-profile.children.cycles- > pp.native_queued_spin_lock_slowpath > 2.68 +0.9 3.62 ± 3% perf-profile.children.cycles- > pp.check_heap_object > 5.69 +1.0 6.64 perf-profile.children.cycles- > pp.lock_sock_nested > 6.77 +1.0 7.80 perf-profile.children.cycles- > pp._raw_spin_lock_bh > 3.19 +1.1 4.26 ± 2% perf-profile.children.cycles- > pp.__check_object_size > 2.94 +1.1 4.01 perf-profile.children.cycles- > pp.filemap_get_read_batch > 3.29 +1.1 4.38 ± 2% perf-profile.children.cycles- > pp.simple_copy_to_iter > 3.16 +1.1 4.29 perf-profile.children.cycles- > pp.filemap_get_pages > 6.68 +1.7 8.36 perf-profile.children.cycles-pp.copyout > 7.06 +1.8 8.85 perf-profile.children.cycles-pp._copy_to_iter > 31.77 +1.9 33.68 perf-profile.children.cycles-pp.tcp_recvmsg > 31.86 +1.9 33.78 perf-profile.children.cycles-pp.inet_recvmsg > 32.07 +1.9 34.01 perf-profile.children.cycles- > pp.sock_recvmsg > 32.56 +2.0 34.56 perf-profile.children.cycles- > pp.__sys_recvfrom > 32.65 +2.0 34.65 perf-profile.children.cycles- > pp.__x64_sys_recvfrom > 33.95 +2.0 35.97 perf-profile.children.cycles-pp.recv > 6.63 +2.0 8.66 perf-profile.children.cycles-pp.filemap_read > 34.18 +2.0 36.23 perf-profile.children.cycles- > pp.accept_connections > 34.18 +2.0 36.23 perf-profile.children.cycles- > pp.accept_connection > 34.18 +2.0 36.23 perf-profile.children.cycles-pp.spawn_child > 34.18 +2.0 36.23 perf-profile.children.cycles- > pp.process_requests > 7.51 +2.2 9.74 perf-profile.children.cycles- > pp.generic_file_splice_read > 11.31 +3.2 14.48 perf-profile.children.cycles- > pp.skb_copy_datagram_iter > 11.29 +3.2 14.46 perf-profile.children.cycles- > pp.__skb_datagram_iter > 29.29 +3.2 32.51 perf-profile.children.cycles- > pp.tcp_recvmsg_locked > 3.33 ± 3% -3.0 0.32 ± 2% perf-profile.self.cycles- > pp.mem_cgroup_charge_skmem > 2.89 -2.6 0.29 perf-profile.self.cycles- > pp.__sk_mem_raise_allocated > 1.72 ± 4% -1.5 0.18 ± 3% perf-profile.self.cycles- > pp.try_charge_memcg > 1.38 ± 5% -1.3 0.05 ± 7% perf-profile.self.cycles- > pp.page_counter_try_charge > 1.10 ± 4% -1.1 0.04 ± 44% perf-profile.self.cycles- > pp.page_counter_uncharge > 5.95 -0.6 5.30 perf-profile.self.cycles-pp.acpi_safe_halt > 0.69 ± 4% -0.6 0.12 ± 3% perf-profile.self.cycles- > pp.mem_cgroup_uncharge_skmem > 0.64 ± 2% -0.5 0.16 ± 3% perf-profile.self.cycles- > pp.__sk_mem_reduce_allocated > 0.25 ± 13% -0.2 0.08 ± 8% perf-profile.self.cycles- > pp.cgroup_rstat_updated > 0.27 ± 2% -0.2 0.10 ± 3% perf-profile.self.cycles-pp.refill_stock > 0.67 ± 2% -0.1 0.55 perf-profile.self.cycles-pp.tcp_ack > 0.15 ± 3% -0.1 0.05 perf-profile.self.cycles- > pp.__sk_mem_schedule > 0.28 ± 6% -0.1 0.20 ± 5% perf-profile.self.cycles- > pp.newidle_balance > 0.22 -0.0 0.17 ± 2% perf-profile.self.cycles- > pp.tcp_check_space > 0.24 ± 3% -0.0 0.20 ± 2% perf-profile.self.cycles- > pp.enqueue_task_fair > 0.12 ± 3% -0.0 0.08 ± 8% perf-profile.self.cycles- > pp.update_sg_lb_stats > 0.06 -0.0 0.02 ± 99% perf-profile.self.cycles- > pp.update_irq_load_avg > 0.30 -0.0 0.27 ± 2% perf-profile.self.cycles- > pp.native_irq_return_iret > 0.34 -0.0 0.30 ± 2% perf-profile.self.cycles-pp.__schedule > 0.11 -0.0 0.08 ± 4% perf-profile.self.cycles- > pp.ct_kernel_exit_state > 0.22 ± 3% -0.0 0.19 perf-profile.self.cycles- > pp.__switch_to_asm > 0.41 -0.0 0.38 perf-profile.self.cycles-pp.native_sched_clock > 0.19 ± 2% -0.0 0.16 ± 2% perf-profile.self.cycles- > pp.native_apic_msr_eoi_write > 0.22 ± 2% -0.0 0.19 perf-profile.self.cycles-pp.menu_select > 0.26 -0.0 0.23 ± 2% perf-profile.self.cycles-pp.__switch_to > 0.22 ± 2% -0.0 0.20 ± 3% perf-profile.self.cycles- > pp.loopback_xmit > 0.18 ± 3% -0.0 0.16 ± 4% perf-profile.self.cycles- > pp.___perf_sw_event > 0.11 ± 4% -0.0 0.09 ± 7% perf-profile.self.cycles- > pp.resched_curr > 0.13 -0.0 0.11 ± 4% perf-profile.self.cycles-pp.do_idle > 0.08 ± 6% -0.0 0.06 perf-profile.self.cycles- > pp.pick_next_task_fair > 0.17 ± 2% -0.0 0.15 ± 2% perf-profile.self.cycles- > pp.__x2apic_send_IPI_dest > 0.14 ± 3% -0.0 0.13 perf-profile.self.cycles-pp.__release_sock > 0.15 ± 3% -0.0 0.13 ± 3% perf-profile.self.cycles- > pp.__update_load_avg_se > 0.10 ± 3% -0.0 0.09 perf-profile.self.cycles-pp.dequeue_entity > 0.08 ± 4% -0.0 0.07 perf-profile.self.cycles- > pp.cpuidle_idle_call > 0.17 -0.0 0.16 ± 2% perf-profile.self.cycles- > pp.sock_def_readable > 0.07 -0.0 0.06 perf-profile.self.cycles-pp.cpuidle_enter_state > 0.07 -0.0 0.06 perf-profile.self.cycles-pp.__sock_wfree > 0.11 -0.0 0.10 perf-profile.self.cycles- > pp.ttwu_queue_wakelist > 0.10 -0.0 0.09 perf-profile.self.cycles- > pp.asm_sysvec_call_function_single > 0.09 -0.0 0.08 perf-profile.self.cycles- > pp.update_rq_clock_task > 0.09 -0.0 0.08 perf-profile.self.cycles- > pp.__wrgsbase_inactive > 0.08 -0.0 0.07 perf-profile.self.cycles-pp.finish_task_switch > 0.06 -0.0 0.05 perf-profile.self.cycles-pp.cpuidle_enter > 0.14 +0.0 0.15 perf-profile.self.cycles- > pp.enqueue_to_backlog > 0.07 +0.0 0.08 perf-profile.self.cycles-pp.tcp_v4_fill_cb > 0.05 +0.0 0.06 perf-profile.self.cycles-pp.iov_iter_pipe > 0.06 +0.0 0.07 perf-profile.self.cycles-pp.__sk_dst_check > 0.06 +0.0 0.07 ± 5% perf-profile.self.cycles- > pp.demo_interval_tick > 0.06 +0.0 0.07 ± 5% perf-profile.self.cycles-pp.rb_next > 0.07 ± 5% +0.0 0.08 perf-profile.self.cycles-pp.tcp_rearm_rto > 0.12 ± 3% +0.0 0.13 perf-profile.self.cycles-pp.tcp_wfree > 0.18 ± 2% +0.0 0.20 ± 3% perf-profile.self.cycles- > pp.process_backlog > 0.07 ± 5% +0.0 0.08 ± 5% perf-profile.self.cycles- > pp.tcp_chrono_stop > 0.05 +0.0 0.06 ± 7% perf-profile.self.cycles- > pp.sk_filter_trim_cap > 0.23 +0.0 0.24 perf-profile.self.cycles- > pp.__update_load_avg_cfs_rq > 0.06 ± 6% +0.0 0.07 ± 5% perf-profile.self.cycles- > pp.splice_from_pipe_next > 0.08 ± 5% +0.0 0.10 ± 6% perf-profile.self.cycles- > pp.tcp_event_new_data_sent > 0.11 ± 6% +0.0 0.13 ± 5% perf-profile.self.cycles- > pp.exit_to_user_mode_prepare > 0.15 ± 4% +0.0 0.16 ± 3% perf-profile.self.cycles- > pp.syscall_exit_to_user_mode_prepare > 0.10 ± 4% +0.0 0.12 perf-profile.self.cycles-pp.tcp_push > 0.10 +0.0 0.12 ± 4% perf-profile.self.cycles- > pp.direct_splice_actor > 0.10 +0.0 0.12 ± 4% perf-profile.self.cycles-pp.inet_ehashfn > 0.19 +0.0 0.21 ± 2% perf-profile.self.cycles-pp.recv > 0.07 +0.0 0.09 ± 5% perf-profile.self.cycles- > pp.demo_stream_interval > 0.14 ± 3% +0.0 0.16 ± 4% perf-profile.self.cycles- > pp.tcp_add_backlog > 0.15 ± 2% +0.0 0.17 ± 3% perf-profile.self.cycles- > pp.ip_send_check > 0.14 +0.0 0.16 ± 5% perf-profile.self.cycles-pp.ipv4_dst_check > 0.08 +0.0 0.10 ± 3% perf-profile.self.cycles-pp.make_vfsuid > 0.06 +0.0 0.08 ± 4% perf-profile.self.cycles- > pp.tcp_rtt_estimator > 0.10 ± 4% +0.0 0.12 ± 4% perf-profile.self.cycles- > pp.syscall_enter_from_user_mode > 0.10 ± 4% +0.0 0.12 ± 4% perf-profile.self.cycles- > pp.__get_task_ioprio > 0.09 ± 4% +0.0 0.11 ± 4% perf-profile.self.cycles- > pp.inet_recvmsg > 0.10 ± 6% +0.0 0.12 perf-profile.self.cycles- > pp.tcp_schedule_loss_probe > 0.04 ± 50% +0.0 0.06 perf-profile.self.cycles-pp.rb_first > 0.07 +0.0 0.09 perf-profile.self.cycles-pp.__list_add_valid > 0.08 ± 5% +0.0 0.10 ± 6% perf-profile.self.cycles-pp.xas_start > 0.08 ± 5% +0.0 0.10 ± 3% perf-profile.self.cycles- > pp.make_vfsgid > 0.06 ± 6% +0.0 0.08 ± 4% perf-profile.self.cycles- > pp.tcp_event_data_recv > 0.13 ± 3% +0.0 0.16 ± 3% perf-profile.self.cycles- > pp.tcp_recvmsg > 0.10 ± 4% +0.0 0.12 ± 4% perf-profile.self.cycles- > pp.check_stack_object > 0.08 ± 5% +0.0 0.10 perf-profile.self.cycles- > pp.is_vmalloc_addr > 0.09 ± 5% +0.0 0.11 ± 3% perf-profile.self.cycles- > pp.tcp_downgrade_zcopy_pure > 0.10 ± 3% +0.0 0.12 ± 4% perf-profile.self.cycles- > pp.tcp_release_cb > 0.22 ± 2% +0.0 0.24 perf-profile.self.cycles- > pp.do_splice_direct > 0.10 ± 7% +0.0 0.12 ± 3% perf-profile.self.cycles- > pp.ip_protocol_deliver_rcu > 0.10 ± 4% +0.0 0.12 ± 3% perf-profile.self.cycles- > pp.tcp_update_pacing_rate > 0.30 +0.0 0.32 ± 2% perf-profile.self.cycles-pp.__alloc_skb > 0.58 +0.0 0.61 perf-profile.self.cycles- > pp._raw_spin_lock_irqsave > 0.23 ± 2% +0.0 0.26 ± 2% perf-profile.self.cycles- > pp.recv_tcp_stream > 0.13 +0.0 0.16 ± 3% perf-profile.self.cycles- > pp._raw_spin_unlock_bh > 0.10 ± 4% +0.0 0.13 ± 8% perf-profile.self.cycles- > pp.inet_send_prepare > 0.20 +0.0 0.23 ± 3% perf-profile.self.cycles-pp.ip_rcv_core > 0.13 ± 3% +0.0 0.15 ± 6% perf-profile.self.cycles- > pp.tcp_mtu_probe > 0.13 ± 3% +0.0 0.16 ± 2% perf-profile.self.cycles-pp.xas_load > 0.06 ± 7% +0.0 0.09 ± 5% perf-profile.self.cycles- > pp.__tcp_cleanup_rbuf > 0.13 ± 3% +0.0 0.16 ± 2% perf-profile.self.cycles- > pp.validate_xmit_skb > 0.12 +0.0 0.15 ± 3% perf-profile.self.cycles- > pp.folio_mark_accessed > 0.16 +0.0 0.19 ± 3% perf-profile.self.cycles- > pp.__tcp_select_window > 0.27 +0.0 0.30 perf-profile.self.cycles-pp.__sys_recvfrom > 0.15 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles- > pp.tcp_tx_timestamp > 0.15 ± 2% +0.0 0.18 ± 2% perf-profile.self.cycles- > pp.do_splice_to > 0.31 +0.0 0.34 perf-profile.self.cycles-pp.__skb_clone > 0.12 +0.0 0.15 ± 3% perf-profile.self.cycles- > pp.simple_copy_to_iter > 0.14 ± 3% +0.0 0.18 ± 2% perf-profile.self.cycles- > pp.rw_verify_area > 0.18 +0.0 0.22 ± 2% perf-profile.self.cycles-pp.sock_sendpage > 0.12 ± 3% +0.0 0.16 perf-profile.self.cycles- > pp.syscall_exit_to_user_mode > 0.24 +0.0 0.28 ± 2% perf-profile.self.cycles-pp.__mod_timer > 0.09 ± 5% +0.0 0.12 ± 6% perf-profile.self.cycles- > pp.__tcp_send_ack > 0.16 +0.0 0.20 perf-profile.self.cycles-pp.fsnotify_perm > 0.11 +0.0 0.15 ± 7% perf-profile.self.cycles- > pp.ktime_get_coarse_real_ts64 > 0.17 ± 2% +0.0 0.21 perf-profile.self.cycles-pp.skb_clone > 0.20 ± 2% +0.0 0.24 ± 5% perf-profile.self.cycles- > pp.__entry_text_start > 0.39 ± 2% +0.0 0.44 perf-profile.self.cycles- > pp.__ip_queue_xmit > 0.17 ± 2% +0.0 0.22 ± 3% perf-profile.self.cycles- > pp.generic_file_splice_read > 0.18 ± 3% +0.0 0.23 ± 6% perf-profile.self.cycles- > pp.lock_sock_nested > 0.47 ± 2% +0.0 0.52 ± 2% perf-profile.self.cycles- > pp.tcp_recvmsg_locked > 0.22 ± 2% +0.0 0.27 perf-profile.self.cycles- > pp.filemap_get_pages > 0.00 +0.1 0.05 perf-profile.self.cycles-pp.tcp_options_write > 0.00 +0.1 0.05 perf-profile.self.cycles-pp.tcp_rbtree_insert > 0.00 +0.1 0.05 perf-profile.self.cycles- > pp.skb_network_protocol > 0.20 +0.1 0.25 perf-profile.self.cycles-pp.rcu_all_qs > 0.00 +0.1 0.05 ± 7% perf-profile.self.cycles- > pp.__tcp_ack_snd_check > 0.43 +0.1 0.48 perf-profile.self.cycles-pp._raw_spin_lock > 0.25 +0.1 0.30 ± 2% perf-profile.self.cycles-pp.touch_atime > 0.46 +0.1 0.52 ± 2% perf-profile.self.cycles-pp.net_rx_action > 0.16 ± 2% +0.1 0.22 ± 3% perf-profile.self.cycles- > pp.tcp_stream_alloc_skb > 0.23 ± 2% +0.1 0.28 perf-profile.self.cycles- > pp.copy_page_to_iter > 0.33 ± 2% +0.1 0.39 perf-profile.self.cycles- > pp.splice_direct_to_actor > 0.31 +0.1 0.37 ± 2% perf-profile.self.cycles-pp.dst_release > 0.21 +0.1 0.27 perf-profile.self.cycles-pp.sanity > 0.26 +0.1 0.32 perf-profile.self.cycles- > pp.security_file_permission > 0.25 ± 2% +0.1 0.31 perf-profile.self.cycles-pp.aa_file_perm > 1.04 +0.1 1.11 ± 3% perf-profile.self.cycles-pp.do_sendfile > 0.43 ± 2% +0.1 0.50 perf-profile.self.cycles- > pp.kmem_cache_alloc_node > 0.40 +0.1 0.47 perf-profile.self.cycles-pp.do_syscall_64 > 0.30 ± 2% +0.1 0.37 ± 2% perf-profile.self.cycles- > pp.syscall_return_via_sysret > 0.31 ± 4% +0.1 0.38 ± 2% perf-profile.self.cycles-pp.sock_put > 0.49 +0.1 0.56 perf-profile.self.cycles-pp.kmem_cache_free > 0.22 +0.1 0.29 perf-profile.self.cycles-pp.tcp_tso_segs > 0.64 +0.1 0.71 perf-profile.self.cycles-pp.tcp_v4_rcv > 0.34 ± 2% +0.1 0.41 perf-profile.self.cycles- > pp.generic_splice_sendpage > 0.32 ± 2% +0.1 0.39 perf-profile.self.cycles- > pp.kernel_sendpage > 0.33 +0.1 0.40 perf-profile.self.cycles-pp.__put_user_8 > 0.57 +0.1 0.64 perf-profile.self.cycles- > pp.tcp_clean_rtx_queue > 0.34 +0.1 0.42 perf-profile.self.cycles-pp.inet_sendpage > 0.67 +0.1 0.74 perf-profile.self.cycles-pp.read_tsc > 0.34 +0.1 0.42 perf-profile.self.cycles- > pp.tcp_established_options > 0.33 +0.1 0.41 perf-profile.self.cycles-pp.pipe_to_sendpage > 0.31 +0.1 0.38 perf-profile.self.cycles-pp.tcp_send_mss > 0.32 ± 3% +0.1 0.40 perf-profile.self.cycles-pp.current_time > 0.33 ± 2% +0.1 0.42 ± 12% perf-profile.self.cycles-pp.ktime_get > 0.36 ± 2% +0.1 0.45 ± 2% perf-profile.self.cycles- > pp.release_sock > 0.38 +0.1 0.46 ± 2% perf-profile.self.cycles- > pp.__virt_addr_valid > 0.55 +0.1 0.64 perf-profile.self.cycles- > pp.entry_SYSCALL_64_after_hwframe > 0.71 +0.1 0.80 perf-profile.self.cycles-pp.__dev_queue_xmit > 0.41 +0.1 0.50 perf-profile.self.cycles- > pp.entry_SYSRETQ_unsafe_stack > 0.45 +0.1 0.54 perf-profile.self.cycles- > pp.__local_bh_enable_ip > 0.64 +0.1 0.74 perf-profile.self.cycles- > pp.tcp_rcv_established > 0.41 +0.1 0.50 perf-profile.self.cycles- > pp.__check_object_size > 0.46 +0.1 0.56 perf-profile.self.cycles-pp.netperf_sendfile > 0.47 +0.1 0.57 perf-profile.self.cycles-pp.__cond_resched > 0.39 +0.1 0.49 perf-profile.self.cycles-pp._copy_to_iter > 0.51 +0.1 0.62 ± 2% perf-profile.self.cycles- > pp.sendfile_tcp_stream > 0.42 +0.1 0.53 perf-profile.self.cycles-pp.sendfile > 0.48 ± 2% +0.1 0.58 perf-profile.self.cycles- > pp.atime_needs_update > 0.46 +0.1 0.57 perf-profile.self.cycles-pp.tcp_current_mss > 0.49 ± 2% +0.1 0.60 perf-profile.self.cycles-pp.tcp_sendpage > 0.95 +0.1 1.06 ± 2% perf-profile.self.cycles- > pp.__tcp_transmit_skb > 0.35 +0.1 0.48 perf-profile.self.cycles- > pp.tcp_rate_check_app_limited > 0.47 +0.1 0.60 perf-profile.self.cycles-pp.__fsnotify_parent > 0.60 +0.1 0.74 perf-profile.self.cycles-pp.__fget_light > 0.36 +0.1 0.49 perf-profile.self.cycles- > pp.page_cache_pipe_buf_confirm > 0.77 +0.1 0.90 ± 2% perf-profile.self.cycles- > pp.page_cache_pipe_buf_release > 0.53 +0.1 0.66 perf-profile.self.cycles-pp._copy_from_user > 0.71 +0.2 0.86 perf-profile.self.cycles- > pp.__splice_from_pipe > 0.65 ± 2% +0.2 0.83 perf-profile.self.cycles- > pp.apparmor_file_permission > 0.81 +0.2 1.00 perf-profile.self.cycles-pp.tcp_write_xmit > 0.77 +0.2 0.97 perf-profile.self.cycles-pp.do_tcp_sendpages > 0.81 +0.2 1.05 perf-profile.self.cycles- > pp.__skb_datagram_iter > 1.00 +0.2 1.25 perf-profile.self.cycles-pp.skb_release_data > 1.11 +0.3 1.38 perf-profile.self.cycles- > pp.copy_page_to_iter_pipe > 2.20 +0.3 2.52 perf-profile.self.cycles-pp._raw_spin_lock_bh > 1.34 +0.4 1.69 perf-profile.self.cycles-pp.filemap_read > 2.01 +0.4 2.40 perf-profile.self.cycles-pp.tcp_build_frag > 4.49 +0.7 5.18 perf-profile.self.cycles- > pp.native_queued_spin_lock_slowpath > 2.17 ± 2% +0.8 2.99 ± 3% perf-profile.self.cycles- > pp.check_heap_object > 2.71 +1.0 3.73 perf-profile.self.cycles- > pp.filemap_get_read_batch > 6.63 +1.7 8.29 perf-profile.self.cycles-pp.copyout > > > > Disclaimer: > Results have been estimated based on internal Intel analysis and are > provided > for informational purposes only. Any difference in system hardware or > software > design or configuration may affect actual performance. > > > -- > 0-DAY CI Kernel Test Service > https://github.com/intel/lkp-tests/wiki > > > > > > > > From 93b3b4c5f356a5090551519522cfd5740ae7e774 Mon Sep 17 00:00:00 > 2001 > > From: Shakeel Butt <shakeelb@xxxxxxxxxx> > > Date: Tue, 16 May 2023 20:30:26 +0000 > > Subject: [PATCH] memcg: skip stock refill in irq context > > > > The linux kernel processes incoming packets in softirq on a given CPU > > and those packets may belong to different jobs. This is very normal on > > large systems running multiple workloads. With memcg enabled, network > > memory for such packets is charged to the corresponding memcgs of the > > jobs. > > > > Memcg charging can be a costly operation and the memcg code > implements > > a per-cpu memcg charge caching optimization to reduce the cost of > > charging. More specifically, the kernel charges the given memcg for more > > memory than requested and keep the remaining charge in a local per-cpu > > cache. The insight behind this heuristic is that there will be more > > charge requests for that memcg in near future. This optimization works > > well when a specific job runs on a CPU for long time and majority of the > > charging requests happen in process context. However the kernel's > > incoming packet processing does not work well with this optimization. > > > > Recently Cathy Zhang has shown [1] that memcg charge flushing within the > > memcg charge path can become a performance bottleneck for the memcg > > charging of network traffic. > > > > Perf profile: > > > > 8.98% mc-worker [kernel.vmlinux] [k] page_counter_cancel > > | > > --8.97%--page_counter_cancel > > | > > --8.97%--page_counter_uncharge > > drain_stock > > __refill_stock > > refill_stock > > | > > --8.91%--try_charge_memcg > > mem_cgroup_charge_skmem > > | > > --8.91%--__sk_mem_raise_allocated > > __sk_mem_schedule > > | > > |--5.41%-- > tcp_try_rmem_schedule > > | tcp_data_queue > > | tcp_rcv_established > > | tcp_v4_do_rcv > > | tcp_v4_rcv > > > > The simplest way to solve this issue is to not refill the memcg charge > > stock in the irq context. Since networking is the main source of memcg > > charging in the irq context, other users will not be impacted. In > > addition, this will preseve the memcg charge cache of the application > > running on that CPU. > > > > There are also potential side effects. What if all the packets belong to > > the same application and memcg? More specifically, users can use Receive > > Flow Steering (RFS) to make sure the kernel process the packets of the > > application on the CPU where the application is running. This change may > > cause the kernel to do slowpath memcg charging more often in irq > > context. > > > > Link: > https://lore.kernel.org/all/IA0PR11MB73557DEAB912737FD61D2873FC749@ > IA0PR11MB7355.namprd11.prod.outlook.com [1] > > Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > > --- > > mm/memcontrol.c | 8 ++++++++ > > 1 file changed, 8 insertions(+) > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 5abffe6f8389..2635aae82b3e 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -2652,6 +2652,14 @@ static int try_charge_memcg(struct > mem_cgroup *memcg, gfp_t gfp_mask, > > bool raised_max_event = false; > > unsigned long pflags; > > > > + /* > > + * Skip the refill in irq context as it may flush the charge cache of > > + * the process running on the CPUs or the kernel may have to process > > + * incoming packets for different memcgs. > > + */ > > + if (!in_task()) > > + batch = nr_pages; > > + > > retry: > > if (consume_stock(memcg, nr_pages)) > > return 0; > > -- > > 2.40.1.606.ga4b1b128d6-goog > >