hi, Jeff Layton, On Thu, Jun 20, 2024 at 04:16:30PM -0400, Jeff Layton wrote: > On Fri, 2024-06-14 at 14:24 +0800, kernel test robot wrote: > > > > > > Hello, > > > > kernel test robot noticed a -34.9% regression of fio.write_iops on: > > > > > > commit: 4edee232ed5d0abb9f24af7af55e3a9aa271f993 ("xfs: switch to multigrain timestamps") > > https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git mgtime > > > > testcase: fio-basic > > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory > > parameters: > > > > runtime: 300s > > disk: 1HDD > > fs: xfs > > nr_task: 1 > > test_size: 128G > > rw: write > > bs: 4k > > ioengine: falloc > > cpufreq_governor: performance > > > > > > > > > > If you fix the issue in a separate patch/commit (i.e. not just a new version of > > the same patch/commit), kindly add following tags > > > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx> > > > Closes: https://lore.kernel.org/oe-lkp/202406141453.7a44f956-oliver.sang@xxxxxxxxx > > > > > > Details are as below: > > --------------------------------------------------------------------------------------------------> > > > > > > The kernel config and materials to reproduce are available at: > > https://download.01.org/0day-ci/archive/20240614/202406141453.7a44f956-oliver.sang@xxxxxxxxx > > > > ========================================================================================= > > bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase: > > 4k/gcc-13/performance/1HDD/xfs/falloc/x86_64-rhel-8.3/1/debian-12-x86_64-20240206.cgz/300s/write/lkp-icl-2sp9/128G/fio-basic > > > > commit: > > 61651220e0 ("fs: have setattr_copy handle multigrain timestamps appropriately") > > 4edee232ed ("xfs: switch to multigrain timestamps") > > > > 61651220e0b91087 4edee232ed5d0abb9f24af7af55 > > ---------------- --------------------------- > > %stddev %change %stddev > > \ | \ > > 0.97 ± 3% -30.7% 0.67 ± 2% iostat.cpu.user > > 2.996e+09 +51.5% 4.54e+09 cpuidle..time > > 222280 ± 4% +44.7% 321595 ± 4% cpuidle..usage > > 0.01 ± 5% -0.0 0.01 ± 6% mpstat.cpu.all.irq% > > 0.97 ± 3% -0.3 0.66 ± 2% mpstat.cpu.all.usr% > > 88.86 +27.3% 113.13 uptime.boot > > 5387 +28.4% 6916 uptime.idle > > 2.98 ± 3% -10.9% 2.65 ± 2% vmstat.procs.r > > 3475 ± 10% -18.6% 2830 ± 6% vmstat.system.cs > > 4.65 ± 43% -2.7 1.97 ±143% perf-profile.calltrace.cycles-pp._free_event.perf_event_release_kernel.perf_release.__fput.task_work_run > > 4.65 ± 43% -2.7 1.97 ±143% perf-profile.children.cycles-pp._free_event > > 3.33 ± 76% -2.4 0.90 ±141% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt > > 3.33 ± 76% -2.4 0.90 ±141% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt > > 769.93 +9.4% 842.10 proc-vmstat.nr_active_anon > > 3936 +2.1% 4020 proc-vmstat.nr_shmem > > 769.93 +9.4% 842.10 proc-vmstat.nr_zone_active_anon > > 269328 +20.8% 325325 ± 11% proc-vmstat.numa_hit > > 203054 ± 2% +27.6% 259008 ± 14% proc-vmstat.numa_local > > 297923 +16.3% 346459 proc-vmstat.pgalloc_normal > > 181868 ± 2% +30.2% 236868 proc-vmstat.pgfault > > 173268 ± 3% +27.2% 220312 proc-vmstat.pgfree > > 9141 ± 7% +23.5% 11288 ± 4% proc-vmstat.pgreuse > > 0.02 ± 26% +0.1 0.10 ± 6% fio.latency_10us% > > 99.87 -8.4 91.43 fio.latency_2us% > > 0.11 ± 20% +8.4 8.47 fio.latency_4us% > > 46.16 +53.3% 70.78 fio.time.elapsed_time > > 46.16 +53.3% 70.78 fio.time.elapsed_time.max > > 35.68 +66.7% 59.50 fio.time.system_time > > 4940 +52.6% 7538 fio.time.voluntary_context_switches > > 2857 -34.9% 1859 fio.write_bw_MBps > > 1176 +64.4% 1933 fio.write_clat_90%_ns > > 1200 +83.1% 2197 fio.write_clat_95%_ns > > 1528 +46.6% 2240 fio.write_clat_99%_ns > > 1167 +62.2% 1893 fio.write_clat_mean_ns > > 731537 -34.9% 476002 fio.write_iops > > I've been trying for several days to reproduce this, and have been > unable so far. Is this the same value as "write.iops" in the json > output? That's been my assumption, but I wanted to check that first. right. then we calculate the average from 6 runs. I attached fio.output from one run for 4edee232ed as example, witin it: "write" : { "io_bytes" : 137438953472, "io_kbytes" : 134217728, "bw_bytes" : 1950180255, "bw" : 1904472, "iops" : 476118.226321, <----- "runtime" : 70475, "total_ios" : 33554432, "short_ios" : 0, "drop_ios" : 0, ... > > That said, I'm only getting ~500k iops at best in this test with the > rig I have, so it's possible I need something faster to show it. > > > > 0.06 ± 6% -25.5% 0.04 ± 5% perf-stat.i.MPKI > > 0.91 ± 3% -0.2 0.67 ± 3% perf-stat.i.branch-miss-rate% > > 27659069 ± 3% -28.0% 19920836 ± 4% perf-stat.i.branch-misses > > 822504 ± 5% -25.2% 615111 ± 6% perf-stat.i.cache-misses > > 7527159 ± 6% -26.9% 5499750 ± 3% perf-stat.i.cache-references > > 3394 ± 11% -18.8% 2756 ± 7% perf-stat.i.context-switches > > 0.46 ± 2% -13.0% 0.40 perf-stat.i.cpi > > 5.727e+09 ± 2% -12.3% 5.02e+09 perf-stat.i.cpu-cycles > > 74.31 -3.0% 72.05 perf-stat.i.cpu-migrations > > 2.31 +13.1% 2.61 perf-stat.i.ipc > > 2905 ± 2% -7.2% 2695 ± 2% perf-stat.i.minor-faults > > 2905 ± 2% -7.2% 2695 ± 2% perf-stat.i.page-faults > > 0.07 ± 6% -25.7% 0.05 ± 5% perf-stat.overall.MPKI > > 1.18 ± 3% -0.3 0.87 ± 2% perf-stat.overall.branch-miss-rate% > > 0.48 ± 2% -12.9% 0.42 perf-stat.overall.cpi > > 6992 ± 6% +17.1% 8190 ± 5% perf-stat.overall.cycles-between-cache-misses > > 2.09 ± 2% +14.7% 2.40 perf-stat.overall.ipc > > 16640 +53.3% 25504 perf-stat.overall.path-length > > 27090197 ± 3% -27.4% 19666246 ± 4% perf-stat.ps.branch-misses > > 805963 ± 5% -24.6% 607413 ± 6% perf-stat.ps.cache-misses > > 7402971 ± 6% -26.4% 5446622 ± 3% perf-stat.ps.cache-references > > 3329 ± 11% -18.2% 2723 ± 7% perf-stat.ps.context-switches > > 5.616e+09 ± 2% -11.7% 4.956e+09 perf-stat.ps.cpu-cycles > > 2843 ± 2% -6.5% 2657 ± 2% perf-stat.ps.minor-faults > > 2843 ± 2% -6.5% 2657 ± 2% perf-stat.ps.page-faults > > 5.584e+11 +53.3% 8.558e+11 perf-stat.total.instructions > > > > > > > > > > Disclaimer: > > Results have been estimated based on internal Intel analysis and are provided > > for informational purposes only. Any difference in system hardware or software > > design or configuration may affect actual performance. > > > > > > Thanks! > -- > Jeff Layton <jlayton@xxxxxxxxxx>
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1 { "fio version" : "fio-3.33", "timestamp" : 1718334968, "timestamp_ms" : 1718334968213, "time" : "Fri Jun 14 03:16:08 2024", "global options" : { "bs" : "4k", "ioengine" : "falloc", "iodepth" : "32", "size" : "137438953472", "direct" : "0", "runtime" : "300", "file_service_type" : "roundrobin", "random_distribution" : "random", "pre_read" : "0", "nrfiles" : "1", "filesize" : "137438953472", "invalidate" : "1", "fallocate" : "posix", "io_size" : "137438953472" }, "jobs" : [ { "jobname" : "task_0", "groupid" : 0, "error" : 0, "eta" : 0, "elapsed" : 71, "job options" : { "rw" : "write", "directory" : "/fs/sdb1", "numjobs" : "1" }, "read" : { "io_bytes" : 0, "io_kbytes" : 0, "bw_bytes" : 0, "bw" : 0, "iops" : 0.000000, "runtime" : 0, "total_ios" : 0, "short_ios" : 0, "drop_ios" : 0, "slat_ns" : { "min" : 0, "max" : 0, "mean" : 0.000000, "stddev" : 0.000000, "N" : 0 }, "clat_ns" : { "min" : 0, "max" : 0, "mean" : 0.000000, "stddev" : 0.000000, "N" : 0 }, "lat_ns" : { "min" : 0, "max" : 0, "mean" : 0.000000, "stddev" : 0.000000, "N" : 0 }, "bw_min" : 0, "bw_max" : 0, "bw_agg" : 0.000000, "bw_mean" : 0.000000, "bw_dev" : 0.000000, "bw_samples" : 0, "iops_min" : 0, "iops_max" : 0, "iops_mean" : 0.000000, "iops_stddev" : 0.000000, "iops_samples" : 0 }, "write" : { "io_bytes" : 137438953472, "io_kbytes" : 134217728, "bw_bytes" : 1950180255, "bw" : 1904472, "iops" : 476118.226321, "runtime" : 70475, "total_ios" : 33554432, "short_ios" : 0, "drop_ios" : 0, "slat_ns" : { "min" : 0, "max" : 0, "mean" : 0.000000, "stddev" : 0.000000, "N" : 0 }, "clat_ns" : { "min" : 1778, "max" : 71068, "mean" : 1894.879822, "stddev" : 149.956185, "N" : 33554432, "percentile" : { "1.000000" : 1816, "5.000000" : 1816, "10.000000" : 1832, "20.000000" : 1832, "30.000000" : 1848, "40.000000" : 1848, "50.000000" : 1864, "60.000000" : 1864, "70.000000" : 1880, "80.000000" : 1896, "90.000000" : 1944, "95.000000" : 2192, "99.000000" : 2256, "99.500000" : 2288, "99.900000" : 4080, "99.950000" : 4512, "99.990000" : 5408 } }, "lat_ns" : { "min" : 1812, "max" : 71102, "mean" : 1929.203544, "stddev" : 150.831850, "N" : 33554432 }, "bw_min" : 1823696, "bw_max" : 1925208, "bw_agg" : 100.000000, "bw_mean" : 1904624.685714, "bw_dev" : 19260.493657, "bw_samples" : 140, "iops_min" : 455924, "iops_max" : 481302, "iops_mean" : 476156.171429, "iops_stddev" : 4815.123414, "iops_samples" : 140 }, "trim" : { "io_bytes" : 0, "io_kbytes" : 0, "bw_bytes" : 0, "bw" : 0, "iops" : 0.000000, "runtime" : 0, "total_ios" : 0, "short_ios" : 0, "drop_ios" : 0, "slat_ns" : { "min" : 0, "max" : 0, "mean" : 0.000000, "stddev" : 0.000000, "N" : 0 }, "clat_ns" : { "min" : 0, "max" : 0, "mean" : 0.000000, "stddev" : 0.000000, "N" : 0 }, "lat_ns" : { "min" : 0, "max" : 0, "mean" : 0.000000, "stddev" : 0.000000, "N" : 0 }, "bw_min" : 0, "bw_max" : 0, "bw_agg" : 0.000000, "bw_mean" : 0.000000, "bw_dev" : 0.000000, "bw_samples" : 0, "iops_min" : 0, "iops_max" : 0, "iops_mean" : 0.000000, "iops_stddev" : 0.000000, "iops_samples" : 0 }, "sync" : { "total_ios" : 0, "lat_ns" : { "min" : 0, "max" : 0, "mean" : 0.000000, "stddev" : 0.000000, "N" : 0 } }, "job_runtime" : 70474, "usr_cpu" : 15.249596, "sys_cpu" : 84.560263, "ctx" : 177, "majf" : 0, "minf" : 11, "iodepth_level" : { "1" : 100.000000, "2" : 0.000000, "4" : 0.000000, "8" : 0.000000, "16" : 0.000000, "32" : 0.000000, ">=64" : 0.000000 }, "iodepth_submit" : { "0" : 0.000000, "4" : 100.000000, "8" : 0.000000, "16" : 0.000000, "32" : 0.000000, "64" : 0.000000, ">=64" : 0.000000 }, "iodepth_complete" : { "0" : 0.000000, "4" : 100.000000, "8" : 0.000000, "16" : 0.000000, "32" : 0.000000, "64" : 0.000000, ">=64" : 0.000000 }, "latency_ns" : { "2" : 0.000000, "4" : 0.000000, "10" : 0.000000, "20" : 0.000000, "50" : 0.000000, "100" : 0.000000, "250" : 0.000000, "500" : 0.000000, "750" : 0.000000, "1000" : 0.000000 }, "latency_us" : { "2" : 91.456613, "4" : 8.435845, "10" : 0.107190, "20" : 0.010000, "50" : 0.010000, "100" : 0.010000, "250" : 0.000000, "500" : 0.000000, "750" : 0.000000, "1000" : 0.000000 }, "latency_ms" : { "2" : 0.000000, "4" : 0.000000, "10" : 0.000000, "20" : 0.000000, "50" : 0.000000, "100" : 0.000000, "250" : 0.000000, "500" : 0.000000, "750" : 0.000000, "1000" : 0.000000, "2000" : 0.000000, ">=2000" : 0.000000 }, "latency_depth" : 32, "latency_target" : 0, "latency_percentile" : 100.000000, "latency_window" : 0 } ], "disk_util" : [ { "name" : "sdb", "read_ios" : 0, "write_ios" : 18, "read_merges" : 0, "write_merges" : 6, "read_ticks" : 0, "write_ticks" : 5662, "in_queue" : 6085, "util" : 1.103687 } ] }