Re: [linux-next:master] [mm] 8f33a2ff30: stress-ng.resched.ops_per_sec -10.3% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello.

> 
> Hello,
> 
> kernel test robot noticed a -10.3% regression of stress-ng.resched.ops_per_sec on:
> 
> 
> commit: 8f33a2ff307248c3e55a7696f60b3658b28edb57 ("mm: vmalloc: set nr_nodes based on CPUs in a system")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> testcase: stress-ng
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	nr_threads: 100%
> 	testtime: 60s
> 	test: resched
> 	cpufreq_governor: performance
> 
> 
> In addition to that, the commit also has significant impact on the following tests:
> 
> +------------------+-------------------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.pthread.ops_per_sec 23.0% improvement                                |
> | test machine     | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
> | test parameters  | cpufreq_governor=performance                                                              |
> |                  | nr_threads=100%                                                                           |
> |                  | test=pthread                                                                              |
> |                  | testtime=60s                                                                              |
> +------------------+-------------------------------------------------------------------------------------------+
> | testcase: change | stress-ng: stress-ng.fstat.ops_per_sec 14.2% improvement                                  |
> | test machine     | 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory |
> | test parameters  | cpufreq_governor=performance                                                              |
> |                  | disk=1HDD                                                                                 |
> |                  | fs=xfs                                                                                    |
> |                  | nr_threads=100%                                                                           |
> |                  | test=fstat                                                                                |
> |                  | testtime=60s                                                                              |
> +------------------+-------------------------------------------------------------------------------------------+
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> | Closes: https://lore.kernel.org/oe-lkp/202402292306.8520763a-oliver.sang@xxxxxxxxx
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240229/202402292306.8520763a-oliver.sang@xxxxxxxxx
> 
> =========================================================================================
> compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/100%/debian-12-x86_64-20240206.cgz/lkp-icl-2sp7/resched/stress-ng/60s
> 
> commit: 
>   8e1d743f2c ("mm: vmalloc: support multiple nodes in vmallocinfo")
>   8f33a2ff30 ("mm: vmalloc: set nr_nodes based on CPUs in a system")
>
8e1d743f2c ("mm: vmalloc: support multiple nodes in vmallocinfo") this
commit has nothing to do with this test.

> 
> 8e1d743f2c2671aa 8f33a2ff307248c3e55a7696f60 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>       7.48            -0.8        6.73        mpstat.cpu.all.nice%
>   10439977           -10.4%    9351864        vmstat.system.cs
>   14670714 ±  3%     +18.1%   17330709 ±  5%  numa-numastat.node0.local_node
>   14688319 ±  3%     +18.1%   17348214 ±  5%  numa-numastat.node0.numa_hit
>   14538034 ±  3%     +15.7%   16824234 ±  4%  numa-numastat.node1.local_node
>   14556613 ±  3%     +15.6%   16834659 ±  4%  numa-numastat.node1.numa_hit
>   14685240 ±  3%     +18.0%   17334251 ±  5%  numa-vmstat.node0.numa_hit
>   14667635 ±  3%     +18.1%   17316745 ±  5%  numa-vmstat.node0.numa_local
>   14551744 ±  3%     +15.6%   16815047 ±  4%  numa-vmstat.node1.numa_hit
>   14533165 ±  3%     +15.6%   16804623 ±  4%  numa-vmstat.node1.numa_local
>  9.153e+08           -10.3%  8.208e+08        stress-ng.resched.ops
>   15220752           -10.3%   13651349        stress-ng.resched.ops_per_sec
>  6.584e+08           -10.8%  5.871e+08        stress-ng.time.involuntary_context_switches
>
I tested the "resched" use case on my setup to check the commit:

8f33a2ff30 ("mm: vmalloc: set nr_nodes based on CPUs in a system")

n=0; while [ $n -lt 20 ]; do stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --resched 64; n=$(( $n + 1 )); done

1) One socket system 32 CPUS, 64 threads, 128G of memory:

 (revert 8f33a2ff30)         (with 8f33a2ff30)
  resched   bogo ops/s      resched   bogo ops/s     resched diff %
1105043856   18404843      1110469441  18491268          -0.49
1094766811   18231572      1117884383  18616359          -2.11
1103621287   18376740      1105661054  18411893          -0.18
1079532022   17973123      1101247950  18337844          -2.01
1099874899   18316050      1089695381  18144556           0.93
1076430974   17921542      1074824321  17899317           0.15
1071025136   17835263      1097552346  18276981          -2.48
1092038983   18182772      1103594553  18377955          -1.06
1099140652   18299703      1080602374  17994387           1.69
1100454122   18324364      1094512741  18227744           0.54
1092551777   18195189      1099387884  18305866          -0.63
1098877800   18297198      1095319518  18240721           0.32
1103042823   18366819      1086364199  18090137           1.51
1083722244   18046970      1073436871  17876677           0.95
1101988080   18350823      1080819704  17996891           1.92
1086171084   18087685      1080936227  17998387           0.48
1106178491   18419226      1078155643  17953565           2.53
1084124963   18053216      1087789728  18111601          -0.34
1076017418   17916972      1090240538  18153644          -1.32
1091438151   18174424      1094233215  18221998          -0.26

no difference.

2) Simulated a NUMA system same as your configuration, two nodes with
   16 CPUs each, in total 64 threads:

Do not post result here since no difference.

--
Uladzislau Rezki




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux