Re: [lkp-robot] [MD] 0ffbb1adf8: aim7.jobs-per-min -10.6% regression

Ye Xiaolong <xiaolong.ye@xxxxxxxxx> · Thu, 26 Apr 2018 08:45:40 +0800

Hi, Xiao Ni

Sorry for the late response.

On 04/24, Xiao Ni wrote:
>Hi all
>
>It's the first time I received such report. So I took some time reading
>the manual of lkp and aim7. And I reserved one server with fedora and
>did a test with the steps in this email. It failed like this:
>
>2018-04-25 02:43:19 echo "/fs/md0" > config
>2018-04-25 02:43:19 
>	(
>		echo storageqe-07.lab.bos.redhat.com
>		echo sync_disk_rw
>
>		echo 1
>		echo 600
>		echo 2
>		echo 600
>		echo 1
>	) | ./multitask -t
>
>AIM Multiuser Benchmark - Suite VII v1.1, January 22, 1996
>Copyright (c) 1996 - 2001 Caldera International, Inc.
>All Rights Reserved.
>
>Machine's name                                              : Machine's configuration                                     : Number of iterations to run [1 to 10]                       : 
>Information for iteration #1
>Starting number of operation loads [1 to 10000]             : 1) Run to crossover
>2) Run to specific operation load           Enter [1 or 2]: Maximum number of operation loads to simulate [600 to 10000]: Operation load increment [1 to 100]                         : 
>Using disk directory </fs/md0>
>HZ is <100>
>AIM Multiuser Benchmark - Suite VII Run Beginning
>
>Tasks    jobs/min  jti  jobs/min/task      real       cpu
>  600/root/lkp-tests/bin/run-local:142:in `system': Interrupt
>	from /root/lkp-tests/bin/run-local:142:in `<main>'
>

Seems there are flaws in our reproduce script, we'll look into it.

>So now I can't understand these information from this report. 
>What does "-10.6% regression of aim7.jobs-per-min" mean? And
>what's the usage of aim7.jobs-per-min? Could anyone help to
>give some suggestions? What should I do to resolve such problem?
>

Here "-10.6% regression of aim7.jobs-per-min" means the value of aim7.jobs-per-min
in test for commit 0ffbb1adf8 is 10.6% less compared to its parent commit v4.16
(0day bot captured your email patch and applied it on top of v4.16).

aim7.jobs-per-min was obtained through the raw output of aim7 test, such as below:

AIM Multiuser Benchmark - Suite VII v1.1, January 22, 1996
Copyright (c) 1996 - 2001 Caldera International, Inc.
All Rights Reserved.

Machine's name                                              : Machine's configuration                                     : Number of iterations to run [1 to 10]                       : 
Information for iteration #1
Starting number of operation loads [1 to 10000]             : 1) Run to crossover
2) Run to specific operation load           Enter [1 or 2]: Maximum number of operation loads to simulate [600 to 10000]: Operation load increment [1 to 100]                         : 
Using disk directory </fs/md0>
HZ is <100>
AIM Multiuser Benchmark - Suite VII Run Beginning

Tasks    jobs/min  jti  jobs/min/task      real       cpu
  600     1466.27   99         2.4438   2455.21  92829.76   Fri Apr 20 10:28:19 2018

AIM Multiuser Benchmark - Suite VII
   Testing over

aim7.jobs-per-min is the main kpi for aim7 tests, other numbers listed in comparison
are less important, they are collected through multiple monitors (vmstat, mpstat) running in the background.
We hope they can help you evaluate your patch in a complete way.

Thanks,
Xiaolong

>Best Regards
>Xiao
>
>----- Original Message -----
>> From: "kernel test robot" <xiaolong.ye@xxxxxxxxx>
>> To: "Xiao Ni" <xni@xxxxxxxxxx>
>> Cc: linux-raid@xxxxxxxxxxxxxxx, shli@xxxxxxxxxx, "ming lei" <ming.lei@xxxxxxxxxx>, ncroxon@xxxxxxxxxx,
>> neilb@xxxxxxxx, lkp@xxxxxx
>> Sent: Monday, April 23, 2018 8:41:43 AM
>> Subject: [lkp-robot] [MD]  0ffbb1adf8:  aim7.jobs-per-min -10.6% regression
>> 
>> 
>> Greeting,
>> 
>> FYI, we noticed a -10.6% regression of aim7.jobs-per-min due to commit:
>> 
>> 
>> commit: 0ffbb1adf8b448568b44fe44c5fcdcf485040365 ("MD: fix lock contention
>> for flush bios")
>> url:
>> https://github.com/0day-ci/linux/commits/Xiao-Ni/MD-fix-lock-contention-for-flush-bios/20180411-040300
>> 
>> 
>> in testcase: aim7
>> on test machine: 40 threads Intel(R) Xeon(R) CPU E5-2690 v2 @ 3.00GHz with
>> 384G memory
>> with following parameters:
>> 
>> 	disk: 4BRD_12G
>> 	md: RAID1
>> 	fs: xfs
>> 	test: sync_disk_rw
>> 	load: 600
>> 	cpufreq_governor: performance
>> 
>> test-description: AIM7 is a traditional UNIX system level benchmark suite
>> which is used to test and measure the performance of multiuser system.
>> test-url: https://sourceforge.net/projects/aimbench/files/aim-suite7/
>> 
>> 
>> 
>> Details are as below:
>> -------------------------------------------------------------------------------------------------->
>> 
>> 
>> To reproduce:
>> 
>>         git clone https://github.com/intel/lkp-tests.git
>>         cd lkp-tests
>>         bin/lkp install job.yaml  # job file is attached in this email
>>         bin/lkp run     job.yaml
>> 
>> =========================================================================================
>> compiler/cpufreq_governor/disk/fs/kconfig/load/md/rootfs/tbox_group/test/testcase:
>>   gcc-7/performance/4BRD_12G/xfs/x86_64-rhel-7.2/600/RAID1/debian-x86_64-2016-08-31.cgz/lkp-ivb-ep01/sync_disk_rw/aim7
>> 
>> commit:
>>   v4.16
>>   0ffbb1adf8 ("MD: fix lock contention for flush bios")
>> 
>>            v4.16 0ffbb1adf8b448568b44fe44c5
>> ---------------- --------------------------
>>          %stddev     %change         %stddev
>>              \          |                \
>>       1632 ±  2%     -10.6%       1458        aim7.jobs-per-min
>>       2207 ±  2%     +11.8%       2468        aim7.time.elapsed_time
>>       2207 ±  2%     +11.8%       2468        aim7.time.elapsed_time.max
>>   51186515           -51.5%   24800655
>>   aim7.time.involuntary_context_switches
>>     146259 ±  8%     +31.7%     192669 ±  2%  aim7.time.minor_page_faults
>>      80457 ±  2%     +15.9%      93267        aim7.time.system_time
>>      50.25 ±  2%     +11.8%      56.17        aim7.time.user_time
>>  7.257e+08 ±  2%      +7.3%  7.787e+08
>>  aim7.time.voluntary_context_switches
>>     520491 ± 61%     +53.8%     800775
>>     interrupts.CAL:Function_call_interrupts
>>       2463 ± 18%     +31.7%       3246 ± 15%  numa-vmstat.node0.nr_mapped
>>       4.06 ±  2%      -0.5        3.51        mpstat.cpu.idle%
>>       0.24 ±  6%      -0.1        0.15        mpstat.cpu.iowait%
>>    8829533           +16.2%   10256984        softirqs.SCHED
>>   33149109 ±  2%     +12.3%   37229216        softirqs.TIMER
>>    4724795 ± 33%     +38.5%    6544104        cpuidle.C1E.usage
>>  7.151e+08 ± 40%     -37.8%  4.449e+08        cpuidle.C6.time
>>    3881055 ±122%     -85.7%     553608 ±  2%  cpuidle.C6.usage
>>      61107 ±  2%     -10.7%      54566        vmstat.io.bo
>>       2.60 ± 18%     -51.9%       1.25 ± 34%  vmstat.procs.b
>>     305.10           -16.1%     256.00        vmstat.procs.r
>>     404271           -11.3%     358644        vmstat.system.cs
>>     167773           -16.5%     140121        vmstat.system.in
>>     115358 ±  9%     +39.9%     161430 ±  3%  proc-vmstat.numa_hint_faults
>>      62520 ± 10%     +47.6%      92267 ±  4%
>>      proc-vmstat.numa_hint_faults_local
>>      20893 ± 10%     +29.0%      26948 ±  2%  proc-vmstat.numa_pages_migrated
>>     116983 ±  9%     +39.5%     163161 ±  3%  proc-vmstat.numa_pte_updates
>>    5504935 ±  3%     +12.3%    6179561        proc-vmstat.pgfault
>>      20893 ± 10%     +29.0%      26948 ±  2%  proc-vmstat.pgmigrate_success
>>       2.68 ±  3%      -0.2        2.44        turbostat.C1%
>>    4724733 ± 33%     +38.5%    6544028        turbostat.C1E
>>    3879529 ±122%     -85.8%     552056 ±  2%  turbostat.C6
>>       0.82 ± 43%      -0.4        0.45        turbostat.C6%
>>       3.62 ±  2%     -15.1%       3.08        turbostat.CPU%c1
>>     176728 ±  2%     +13.3%     200310        turbostat.SMI
>>  9.893e+12 ± 65%     +57.6%  1.559e+13        perf-stat.branch-instructions
>>  3.022e+10 ± 65%     +46.6%   4.43e+10        perf-stat.branch-misses
>>      11.31 ± 65%      +5.5       16.78        perf-stat.cache-miss-rate%
>>  2.821e+10 ± 65%     +50.4%  4.243e+10        perf-stat.cache-misses
>>  1.796e+14 ± 65%     +58.4%  2.845e+14        perf-stat.cpu-cycles
>>   26084177 ± 65%    +176.3%   72073346        perf-stat.cpu-migrations
>>  1.015e+13 ± 65%     +57.3%  1.597e+13        perf-stat.dTLB-loads
>>  1.125e+12 ± 65%     +45.6%  1.638e+12        perf-stat.dTLB-stores
>>  4.048e+13 ± 65%     +57.3%  6.367e+13        perf-stat.instructions
>>       4910 ± 65%     +57.5%       7734
>>       perf-stat.instructions-per-iTLB-miss
>>    3847673 ± 65%     +57.4%    6057388        perf-stat.minor-faults
>>  1.403e+10 ± 65%     +51.8%   2.13e+10        perf-stat.node-load-misses
>>  1.557e+10 ± 65%     +51.0%  2.351e+10        perf-stat.node-loads
>>      27.41 ± 65%     +12.1       39.52        perf-stat.node-store-miss-rate%
>>  7.828e+09 ± 65%     +53.2%  1.199e+10        perf-stat.node-store-misses
>>  1.216e+10 ± 65%     +50.9%  1.835e+10        perf-stat.node-stores
>>    3847675 ± 65%     +57.4%    6057392        perf-stat.page-faults
>>    1041337 ±  2%     +12.0%    1166774
>>    sched_debug.cfs_rq:/.exec_clock.avg
>>    1045329 ±  2%     +11.8%    1168250
>>    sched_debug.cfs_rq:/.exec_clock.max
>>    1037380 ±  2%     +12.3%    1165349
>>    sched_debug.cfs_rq:/.exec_clock.min
>>       3670 ± 16%     -70.1%       1098 ± 50%
>>       sched_debug.cfs_rq:/.exec_clock.stddev
>>     234.08 ±  3%     -22.4%     181.73
>>     sched_debug.cfs_rq:/.load_avg.avg
>>      33.65 ±  2%     -59.3%      13.70 ±  3%
>>      sched_debug.cfs_rq:/.load_avg.min
>>   36305683 ±  2%     +12.4%   40814603
>>   sched_debug.cfs_rq:/.min_vruntime.avg
>>   37771587 ±  2%     +11.1%   41960294
>>   sched_debug.cfs_rq:/.min_vruntime.max
>>   34884765 ±  2%     +13.6%   39635549
>>   sched_debug.cfs_rq:/.min_vruntime.min
>>    1277146 ±  9%     -21.8%     998594 ±  4%
>>    sched_debug.cfs_rq:/.min_vruntime.stddev
>>       1.99 ±  5%     -16.8%       1.65 ±  3%
>>       sched_debug.cfs_rq:/.nr_running.max
>>      19.18 ±  5%     +20.3%      23.08
>>      sched_debug.cfs_rq:/.nr_spread_over.avg
>>       8.47 ± 18%     +23.8%      10.49 ± 12%
>>       sched_debug.cfs_rq:/.nr_spread_over.min
>>      22.06 ±  7%     -18.2%      18.05 ±  6%
>>      sched_debug.cfs_rq:/.runnable_load_avg.avg
>>       0.08 ± 38%     -50.5%       0.04 ± 20%  sched_debug.cfs_rq:/.spread.avg
>>    1277131 ±  9%     -21.8%     998584 ±  4%
>>    sched_debug.cfs_rq:/.spread0.stddev
>>     177058 ±  7%     -23.5%     135429        sched_debug.cpu.avg_idle.max
>>      33859 ±  6%     -24.6%      25519        sched_debug.cpu.avg_idle.stddev
>>    1119705 ±  2%     +11.0%    1242378        sched_debug.cpu.clock.avg
>>    1119726 ±  2%     +11.0%    1242399        sched_debug.cpu.clock.max
>>    1119680 ±  2%     +11.0%    1242353        sched_debug.cpu.clock.min
>>    1119705 ±  2%     +11.0%    1242378        sched_debug.cpu.clock_task.avg
>>    1119726 ±  2%     +11.0%    1242399        sched_debug.cpu.clock_task.max
>>    1119680 ±  2%     +11.0%    1242353        sched_debug.cpu.clock_task.min
>>       3.47 ± 11%     +28.4%       4.45 ±  2%  sched_debug.cpu.cpu_load[1].min
>>      28.92 ±  4%      -7.8%      26.66 ±  3%  sched_debug.cpu.cpu_load[2].avg
>>       5.65 ±  7%     +37.9%       7.80 ±  8%  sched_debug.cpu.cpu_load[2].min
>>      29.97 ±  3%      -7.9%      27.60 ±  3%  sched_debug.cpu.cpu_load[3].avg
>>       8.22 ±  4%     +31.9%      10.83 ±  7%  sched_debug.cpu.cpu_load[3].min
>>      10.50 ±  5%     +24.0%      13.02 ±  6%  sched_debug.cpu.cpu_load[4].min
>>       2596 ±  4%      -9.2%       2358 ±  2%  sched_debug.cpu.curr->pid.avg
>>       4463 ±  4%     -13.5%       3862 ±  3%
>>       sched_debug.cpu.curr->pid.stddev
>>    1139237 ±  2%     +11.5%    1269690
>>    sched_debug.cpu.nr_load_updates.avg
>>    1146166 ±  2%     +11.2%    1274264
>>    sched_debug.cpu.nr_load_updates.max
>>    1133061 ±  2%     +11.4%    1262298
>>    sched_debug.cpu.nr_load_updates.min
>>       3943 ±  9%     -32.4%       2666 ± 23%
>>       sched_debug.cpu.nr_load_updates.stddev
>>       8814 ±  7%     +17.8%      10386 ±  2%
>>       sched_debug.cpu.nr_uninterruptible.max
>>      -3613           +32.3%      -4782
>>      sched_debug.cpu.nr_uninterruptible.min
>>       2999 ±  4%     +13.1%       3391
>>       sched_debug.cpu.nr_uninterruptible.stddev
>>      42794 ± 15%     -76.8%       9921 ± 90%
>>      sched_debug.cpu.sched_goidle.stddev
>>     652177           +14.4%     745857        sched_debug.cpu.ttwu_local.avg
>>     684397           +17.7%     805440 ±  2%  sched_debug.cpu.ttwu_local.max
>>     622628           +12.0%     697353 ±  2%  sched_debug.cpu.ttwu_local.min
>>      16189 ± 33%    +128.0%      36916 ± 47%
>>      sched_debug.cpu.ttwu_local.stddev
>>    1119677 ±  2%     +11.0%    1242351        sched_debug.cpu_clk
>>    1119677 ±  2%     +11.0%    1242351        sched_debug.ktime
>>    1120113 ±  2%     +11.0%    1242771        sched_debug.sched_clk
>>                                                                                 
>>                                                                                                                                                                 
>>                                  aim7.jobs-per-min
>>                                                                                 
>>   1750 +-+------------------------------------------------------------------+
>>        |                                                                    |   
>>   1700 +-+                         +             .+.+                .+.+   |
>>   1650 +-+                        + +          .+    :             .+    :  |
>>        |  .+.+.+.    +      .+.+.+   +.       +      :    +.      +      :  |
>>   1600 +-+       +  : + .+.+           +..+. +        +. +  +.+. +        +.|
>>        |          + :  +                    +           +       +           |
>>   1550 +-+         +                                                        |
>>        |                                                                    |   
>>   1500 +-+                                                                  |
>>   1450 +-O     O     O               O O      O O                           |
>>        O   O O     O   O     O O O O      O O     O O                       |
>>   1400 +-+       O       O O                                                |
>>        |                                                                    |   
>>   1350 +-+------------------------------------------------------------------+
>>                                                                                 
>>                                                                                 
>> [*] bisect-good sample
>> [O] bisect-bad  sample
>> 
>> 
>> 
>> Disclaimer:
>> Results have been estimated based on internal Intel analysis and are provided
>> for informational purposes only. Any difference in system hardware or
>> software
>> design or configuration may affect actual performance.
>> 
>> 
>> Thanks,
>> Xiaolong
>> 
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html