Re: [jlayton:mgtime] [xfs] 4edee232ed: fio.write_iops -34.9% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2024-06-14 at 14:24 +0800, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a -34.9% regression of fio.write_iops on:
> 
> 
> commit: 4edee232ed5d0abb9f24af7af55e3a9aa271f993 ("xfs: switch to multigrain timestamps")
> https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git mgtime
> 
> testcase: fio-basic
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	runtime: 300s
> 	disk: 1HDD
> 	fs: xfs
> 	nr_task: 1
> 	test_size: 128G
> 	rw: write
> 	bs: 4k
> 	ioengine: falloc
> 	cpufreq_governor: performance
> 
> 
> 
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > Closes: https://lore.kernel.org/oe-lkp/202406141453.7a44f956-oliver.sang@xxxxxxxxx
> 
> 
> Details are as below:
> -------------------------------------------------------------------------------------------------->
> 
> 
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240614/202406141453.7a44f956-oliver.sang@xxxxxxxxx
> 
> =========================================================================================
> bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
>   4k/gcc-13/performance/1HDD/xfs/falloc/x86_64-rhel-8.3/1/debian-12-x86_64-20240206.cgz/300s/write/lkp-icl-2sp9/128G/fio-basic
> 
> commit: 
>   61651220e0 ("fs: have setattr_copy handle multigrain timestamps appropriately")
>   4edee232ed ("xfs: switch to multigrain timestamps")
> 
> 61651220e0b91087 4edee232ed5d0abb9f24af7af55 
> ---------------- --------------------------- 
>          %stddev     %change         %stddev
>              \          |                \  
>       0.97 ±  3%     -30.7%       0.67 ±  2%  iostat.cpu.user
>  2.996e+09           +51.5%   4.54e+09        cpuidle..time
>     222280 ±  4%     +44.7%     321595 ±  4%  cpuidle..usage
>       0.01 ±  5%      -0.0        0.01 ±  6%  mpstat.cpu.all.irq%
>       0.97 ±  3%      -0.3        0.66 ±  2%  mpstat.cpu.all.usr%
>      88.86           +27.3%     113.13        uptime.boot
>       5387           +28.4%       6916        uptime.idle
>       2.98 ±  3%     -10.9%       2.65 ±  2%  vmstat.procs.r
>       3475 ± 10%     -18.6%       2830 ±  6%  vmstat.system.cs
>       4.65 ± 43%      -2.7        1.97 ±143%  perf-profile.calltrace.cycles-pp._free_event.perf_event_release_kernel.perf_release.__fput.task_work_run
>       4.65 ± 43%      -2.7        1.97 ±143%  perf-profile.children.cycles-pp._free_event
>       3.33 ± 76%      -2.4        0.90 ±141%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
>       3.33 ± 76%      -2.4        0.90 ±141%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
>     769.93            +9.4%     842.10        proc-vmstat.nr_active_anon
>       3936            +2.1%       4020        proc-vmstat.nr_shmem
>     769.93            +9.4%     842.10        proc-vmstat.nr_zone_active_anon
>     269328           +20.8%     325325 ± 11%  proc-vmstat.numa_hit
>     203054 ±  2%     +27.6%     259008 ± 14%  proc-vmstat.numa_local
>     297923           +16.3%     346459        proc-vmstat.pgalloc_normal
>     181868 ±  2%     +30.2%     236868        proc-vmstat.pgfault
>     173268 ±  3%     +27.2%     220312        proc-vmstat.pgfree
>       9141 ±  7%     +23.5%      11288 ±  4%  proc-vmstat.pgreuse
>       0.02 ± 26%      +0.1        0.10 ±  6%  fio.latency_10us%
>      99.87            -8.4       91.43        fio.latency_2us%
>       0.11 ± 20%      +8.4        8.47        fio.latency_4us%
>      46.16           +53.3%      70.78        fio.time.elapsed_time
>      46.16           +53.3%      70.78        fio.time.elapsed_time.max
>      35.68           +66.7%      59.50        fio.time.system_time
>       4940           +52.6%       7538        fio.time.voluntary_context_switches
>       2857           -34.9%       1859        fio.write_bw_MBps
>       1176           +64.4%       1933        fio.write_clat_90%_ns
>       1200           +83.1%       2197        fio.write_clat_95%_ns
>       1528           +46.6%       2240        fio.write_clat_99%_ns
>       1167           +62.2%       1893        fio.write_clat_mean_ns
>     731537           -34.9%     476002        fio.write_iops

I've been trying for several days to reproduce this, and have been
unable so far. Is this the same value as "write.iops" in the json
output? That's been my assumption, but I wanted to check that first.

That said, I'm only getting ~500k iops at best in this test with the
rig I have, so it's possible I need something faster to show it.


>       0.06 ±  6%     -25.5%       0.04 ±  5%  perf-stat.i.MPKI
>       0.91 ±  3%      -0.2        0.67 ±  3%  perf-stat.i.branch-miss-rate%
>   27659069 ±  3%     -28.0%   19920836 ±  4%  perf-stat.i.branch-misses
>     822504 ±  5%     -25.2%     615111 ±  6%  perf-stat.i.cache-misses
>    7527159 ±  6%     -26.9%    5499750 ±  3%  perf-stat.i.cache-references
>       3394 ± 11%     -18.8%       2756 ±  7%  perf-stat.i.context-switches
>       0.46 ±  2%     -13.0%       0.40        perf-stat.i.cpi
>  5.727e+09 ±  2%     -12.3%   5.02e+09        perf-stat.i.cpu-cycles
>      74.31            -3.0%      72.05        perf-stat.i.cpu-migrations
>       2.31           +13.1%       2.61        perf-stat.i.ipc
>       2905 ±  2%      -7.2%       2695 ±  2%  perf-stat.i.minor-faults
>       2905 ±  2%      -7.2%       2695 ±  2%  perf-stat.i.page-faults
>       0.07 ±  6%     -25.7%       0.05 ±  5%  perf-stat.overall.MPKI
>       1.18 ±  3%      -0.3        0.87 ±  2%  perf-stat.overall.branch-miss-rate%
>       0.48 ±  2%     -12.9%       0.42        perf-stat.overall.cpi
>       6992 ±  6%     +17.1%       8190 ±  5%  perf-stat.overall.cycles-between-cache-misses
>       2.09 ±  2%     +14.7%       2.40        perf-stat.overall.ipc
>      16640           +53.3%      25504        perf-stat.overall.path-length
>   27090197 ±  3%     -27.4%   19666246 ±  4%  perf-stat.ps.branch-misses
>     805963 ±  5%     -24.6%     607413 ±  6%  perf-stat.ps.cache-misses
>    7402971 ±  6%     -26.4%    5446622 ±  3%  perf-stat.ps.cache-references
>       3329 ± 11%     -18.2%       2723 ±  7%  perf-stat.ps.context-switches
>  5.616e+09 ±  2%     -11.7%  4.956e+09        perf-stat.ps.cpu-cycles
>       2843 ±  2%      -6.5%       2657 ±  2%  perf-stat.ps.minor-faults
>       2843 ±  2%      -6.5%       2657 ±  2%  perf-stat.ps.page-faults
>  5.584e+11           +53.3%  8.558e+11        perf-stat.total.instructions
> 
> 
> 
> 
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
> 
> 

Thanks!
-- 
Jeff Layton <jlayton@xxxxxxxxxx>




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux