Re: [jlayton:mgtime] [xfs] 4edee232ed: fio.write_iops -34.9% regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi, Jeff Layton,

On Thu, Jun 20, 2024 at 04:16:30PM -0400, Jeff Layton wrote:
> On Fri, 2024-06-14 at 14:24 +0800, kernel test robot wrote:
> > 
> > 
> > Hello,
> > 
> > kernel test robot noticed a -34.9% regression of fio.write_iops on:
> > 
> > 
> > commit: 4edee232ed5d0abb9f24af7af55e3a9aa271f993 ("xfs: switch to multigrain timestamps")
> > https://git.kernel.org/cgit/linux/kernel/git/jlayton/linux.git mgtime
> > 
> > testcase: fio-basic
> > test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> > parameters:
> > 
> > 	runtime: 300s
> > 	disk: 1HDD
> > 	fs: xfs
> > 	nr_task: 1
> > 	test_size: 128G
> > 	rw: write
> > 	bs: 4k
> > 	ioengine: falloc
> > 	cpufreq_governor: performance
> > 
> > 
> > 
> > 
> > If you fix the issue in a separate patch/commit (i.e. not just a new version of
> > the same patch/commit), kindly add following tags
> > > Reported-by: kernel test robot <oliver.sang@xxxxxxxxx>
> > > Closes: https://lore.kernel.org/oe-lkp/202406141453.7a44f956-oliver.sang@xxxxxxxxx
> > 
> > 
> > Details are as below:
> > -------------------------------------------------------------------------------------------------->
> > 
> > 
> > The kernel config and materials to reproduce are available at:
> > https://download.01.org/0day-ci/archive/20240614/202406141453.7a44f956-oliver.sang@xxxxxxxxx
> > 
> > =========================================================================================
> > bs/compiler/cpufreq_governor/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
> >   4k/gcc-13/performance/1HDD/xfs/falloc/x86_64-rhel-8.3/1/debian-12-x86_64-20240206.cgz/300s/write/lkp-icl-2sp9/128G/fio-basic
> > 
> > commit: 
> >   61651220e0 ("fs: have setattr_copy handle multigrain timestamps appropriately")
> >   4edee232ed ("xfs: switch to multigrain timestamps")
> > 
> > 61651220e0b91087 4edee232ed5d0abb9f24af7af55 
> > ---------------- --------------------------- 
> >          %stddev     %change         %stddev
> >              \          |                \  
> >       0.97 ±  3%     -30.7%       0.67 ±  2%  iostat.cpu.user
> >  2.996e+09           +51.5%   4.54e+09        cpuidle..time
> >     222280 ±  4%     +44.7%     321595 ±  4%  cpuidle..usage
> >       0.01 ±  5%      -0.0        0.01 ±  6%  mpstat.cpu.all.irq%
> >       0.97 ±  3%      -0.3        0.66 ±  2%  mpstat.cpu.all.usr%
> >      88.86           +27.3%     113.13        uptime.boot
> >       5387           +28.4%       6916        uptime.idle
> >       2.98 ±  3%     -10.9%       2.65 ±  2%  vmstat.procs.r
> >       3475 ± 10%     -18.6%       2830 ±  6%  vmstat.system.cs
> >       4.65 ± 43%      -2.7        1.97 ±143%  perf-profile.calltrace.cycles-pp._free_event.perf_event_release_kernel.perf_release.__fput.task_work_run
> >       4.65 ± 43%      -2.7        1.97 ±143%  perf-profile.children.cycles-pp._free_event
> >       3.33 ± 76%      -2.4        0.90 ±141%  perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt
> >       3.33 ± 76%      -2.4        0.90 ±141%  perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt
> >     769.93            +9.4%     842.10        proc-vmstat.nr_active_anon
> >       3936            +2.1%       4020        proc-vmstat.nr_shmem
> >     769.93            +9.4%     842.10        proc-vmstat.nr_zone_active_anon
> >     269328           +20.8%     325325 ± 11%  proc-vmstat.numa_hit
> >     203054 ±  2%     +27.6%     259008 ± 14%  proc-vmstat.numa_local
> >     297923           +16.3%     346459        proc-vmstat.pgalloc_normal
> >     181868 ±  2%     +30.2%     236868        proc-vmstat.pgfault
> >     173268 ±  3%     +27.2%     220312        proc-vmstat.pgfree
> >       9141 ±  7%     +23.5%      11288 ±  4%  proc-vmstat.pgreuse
> >       0.02 ± 26%      +0.1        0.10 ±  6%  fio.latency_10us%
> >      99.87            -8.4       91.43        fio.latency_2us%
> >       0.11 ± 20%      +8.4        8.47        fio.latency_4us%
> >      46.16           +53.3%      70.78        fio.time.elapsed_time
> >      46.16           +53.3%      70.78        fio.time.elapsed_time.max
> >      35.68           +66.7%      59.50        fio.time.system_time
> >       4940           +52.6%       7538        fio.time.voluntary_context_switches
> >       2857           -34.9%       1859        fio.write_bw_MBps
> >       1176           +64.4%       1933        fio.write_clat_90%_ns
> >       1200           +83.1%       2197        fio.write_clat_95%_ns
> >       1528           +46.6%       2240        fio.write_clat_99%_ns
> >       1167           +62.2%       1893        fio.write_clat_mean_ns
> >     731537           -34.9%     476002        fio.write_iops
> 
> I've been trying for several days to reproduce this, and have been
> unable so far. Is this the same value as "write.iops" in the json
> output? That's been my assumption, but I wanted to check that first.

right. then we calculate the average from 6 runs.

I attached fio.output from one run for 4edee232ed as example, witin it:

      "write" : {
        "io_bytes" : 137438953472,
        "io_kbytes" : 134217728,
        "bw_bytes" : 1950180255,
        "bw" : 1904472,
        "iops" : 476118.226321,     <-----
        "runtime" : 70475,
        "total_ios" : 33554432,
        "short_ios" : 0,
        "drop_ios" : 0,
...


> 
> That said, I'm only getting ~500k iops at best in this test with the
> rig I have, so it's possible I need something faster to show it.
> 
> 
> >       0.06 ±  6%     -25.5%       0.04 ±  5%  perf-stat.i.MPKI
> >       0.91 ±  3%      -0.2        0.67 ±  3%  perf-stat.i.branch-miss-rate%
> >   27659069 ±  3%     -28.0%   19920836 ±  4%  perf-stat.i.branch-misses
> >     822504 ±  5%     -25.2%     615111 ±  6%  perf-stat.i.cache-misses
> >    7527159 ±  6%     -26.9%    5499750 ±  3%  perf-stat.i.cache-references
> >       3394 ± 11%     -18.8%       2756 ±  7%  perf-stat.i.context-switches
> >       0.46 ±  2%     -13.0%       0.40        perf-stat.i.cpi
> >  5.727e+09 ±  2%     -12.3%   5.02e+09        perf-stat.i.cpu-cycles
> >      74.31            -3.0%      72.05        perf-stat.i.cpu-migrations
> >       2.31           +13.1%       2.61        perf-stat.i.ipc
> >       2905 ±  2%      -7.2%       2695 ±  2%  perf-stat.i.minor-faults
> >       2905 ±  2%      -7.2%       2695 ±  2%  perf-stat.i.page-faults
> >       0.07 ±  6%     -25.7%       0.05 ±  5%  perf-stat.overall.MPKI
> >       1.18 ±  3%      -0.3        0.87 ±  2%  perf-stat.overall.branch-miss-rate%
> >       0.48 ±  2%     -12.9%       0.42        perf-stat.overall.cpi
> >       6992 ±  6%     +17.1%       8190 ±  5%  perf-stat.overall.cycles-between-cache-misses
> >       2.09 ±  2%     +14.7%       2.40        perf-stat.overall.ipc
> >      16640           +53.3%      25504        perf-stat.overall.path-length
> >   27090197 ±  3%     -27.4%   19666246 ±  4%  perf-stat.ps.branch-misses
> >     805963 ±  5%     -24.6%     607413 ±  6%  perf-stat.ps.cache-misses
> >    7402971 ±  6%     -26.4%    5446622 ±  3%  perf-stat.ps.cache-references
> >       3329 ± 11%     -18.2%       2723 ±  7%  perf-stat.ps.context-switches
> >  5.616e+09 ±  2%     -11.7%  4.956e+09        perf-stat.ps.cpu-cycles
> >       2843 ±  2%      -6.5%       2657 ±  2%  perf-stat.ps.minor-faults
> >       2843 ±  2%      -6.5%       2657 ±  2%  perf-stat.ps.page-faults
> >  5.584e+11           +53.3%  8.558e+11        perf-stat.total.instructions
> > 
> > 
> > 
> > 
> > Disclaimer:
> > Results have been estimated based on internal Intel analysis and are provided
> > for informational purposes only. Any difference in system hardware or software
> > design or configuration may affect actual performance.
> > 
> > 
> 
> Thanks!
> -- 
> Jeff Layton <jlayton@xxxxxxxxxx>
note: both iodepth >= 1 and synchronous I/O engine are selected, queue depth will be capped at 1
{
  "fio version" : "fio-3.33",
  "timestamp" : 1718334968,
  "timestamp_ms" : 1718334968213,
  "time" : "Fri Jun 14 03:16:08 2024",
  "global options" : {
    "bs" : "4k",
    "ioengine" : "falloc",
    "iodepth" : "32",
    "size" : "137438953472",
    "direct" : "0",
    "runtime" : "300",
    "file_service_type" : "roundrobin",
    "random_distribution" : "random",
    "pre_read" : "0",
    "nrfiles" : "1",
    "filesize" : "137438953472",
    "invalidate" : "1",
    "fallocate" : "posix",
    "io_size" : "137438953472"
  },
  "jobs" : [
    {
      "jobname" : "task_0",
      "groupid" : 0,
      "error" : 0,
      "eta" : 0,
      "elapsed" : 71,
      "job options" : {
        "rw" : "write",
        "directory" : "/fs/sdb1",
        "numjobs" : "1"
      },
      "read" : {
        "io_bytes" : 0,
        "io_kbytes" : 0,
        "bw_bytes" : 0,
        "bw" : 0,
        "iops" : 0.000000,
        "runtime" : 0,
        "total_ios" : 0,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "clat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "lat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "bw_min" : 0,
        "bw_max" : 0,
        "bw_agg" : 0.000000,
        "bw_mean" : 0.000000,
        "bw_dev" : 0.000000,
        "bw_samples" : 0,
        "iops_min" : 0,
        "iops_max" : 0,
        "iops_mean" : 0.000000,
        "iops_stddev" : 0.000000,
        "iops_samples" : 0
      },
      "write" : {
        "io_bytes" : 137438953472,
        "io_kbytes" : 134217728,
        "bw_bytes" : 1950180255,
        "bw" : 1904472,
        "iops" : 476118.226321,
        "runtime" : 70475,
        "total_ios" : 33554432,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "clat_ns" : {
          "min" : 1778,
          "max" : 71068,
          "mean" : 1894.879822,
          "stddev" : 149.956185,
          "N" : 33554432,
          "percentile" : {
            "1.000000" : 1816,
            "5.000000" : 1816,
            "10.000000" : 1832,
            "20.000000" : 1832,
            "30.000000" : 1848,
            "40.000000" : 1848,
            "50.000000" : 1864,
            "60.000000" : 1864,
            "70.000000" : 1880,
            "80.000000" : 1896,
            "90.000000" : 1944,
            "95.000000" : 2192,
            "99.000000" : 2256,
            "99.500000" : 2288,
            "99.900000" : 4080,
            "99.950000" : 4512,
            "99.990000" : 5408
          }
        },
        "lat_ns" : {
          "min" : 1812,
          "max" : 71102,
          "mean" : 1929.203544,
          "stddev" : 150.831850,
          "N" : 33554432
        },
        "bw_min" : 1823696,
        "bw_max" : 1925208,
        "bw_agg" : 100.000000,
        "bw_mean" : 1904624.685714,
        "bw_dev" : 19260.493657,
        "bw_samples" : 140,
        "iops_min" : 455924,
        "iops_max" : 481302,
        "iops_mean" : 476156.171429,
        "iops_stddev" : 4815.123414,
        "iops_samples" : 140
      },
      "trim" : {
        "io_bytes" : 0,
        "io_kbytes" : 0,
        "bw_bytes" : 0,
        "bw" : 0,
        "iops" : 0.000000,
        "runtime" : 0,
        "total_ios" : 0,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "clat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "lat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "bw_min" : 0,
        "bw_max" : 0,
        "bw_agg" : 0.000000,
        "bw_mean" : 0.000000,
        "bw_dev" : 0.000000,
        "bw_samples" : 0,
        "iops_min" : 0,
        "iops_max" : 0,
        "iops_mean" : 0.000000,
        "iops_stddev" : 0.000000,
        "iops_samples" : 0
      },
      "sync" : {
        "total_ios" : 0,
        "lat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        }
      },
      "job_runtime" : 70474,
      "usr_cpu" : 15.249596,
      "sys_cpu" : 84.560263,
      "ctx" : 177,
      "majf" : 0,
      "minf" : 11,
      "iodepth_level" : {
        "1" : 100.000000,
        "2" : 0.000000,
        "4" : 0.000000,
        "8" : 0.000000,
        "16" : 0.000000,
        "32" : 0.000000,
        ">=64" : 0.000000
      },
      "iodepth_submit" : {
        "0" : 0.000000,
        "4" : 100.000000,
        "8" : 0.000000,
        "16" : 0.000000,
        "32" : 0.000000,
        "64" : 0.000000,
        ">=64" : 0.000000
      },
      "iodepth_complete" : {
        "0" : 0.000000,
        "4" : 100.000000,
        "8" : 0.000000,
        "16" : 0.000000,
        "32" : 0.000000,
        "64" : 0.000000,
        ">=64" : 0.000000
      },
      "latency_ns" : {
        "2" : 0.000000,
        "4" : 0.000000,
        "10" : 0.000000,
        "20" : 0.000000,
        "50" : 0.000000,
        "100" : 0.000000,
        "250" : 0.000000,
        "500" : 0.000000,
        "750" : 0.000000,
        "1000" : 0.000000
      },
      "latency_us" : {
        "2" : 91.456613,
        "4" : 8.435845,
        "10" : 0.107190,
        "20" : 0.010000,
        "50" : 0.010000,
        "100" : 0.010000,
        "250" : 0.000000,
        "500" : 0.000000,
        "750" : 0.000000,
        "1000" : 0.000000
      },
      "latency_ms" : {
        "2" : 0.000000,
        "4" : 0.000000,
        "10" : 0.000000,
        "20" : 0.000000,
        "50" : 0.000000,
        "100" : 0.000000,
        "250" : 0.000000,
        "500" : 0.000000,
        "750" : 0.000000,
        "1000" : 0.000000,
        "2000" : 0.000000,
        ">=2000" : 0.000000
      },
      "latency_depth" : 32,
      "latency_target" : 0,
      "latency_percentile" : 100.000000,
      "latency_window" : 0
    }
  ],
  "disk_util" : [
    {
      "name" : "sdb",
      "read_ios" : 0,
      "write_ios" : 18,
      "read_merges" : 0,
      "write_merges" : 6,
      "read_ticks" : 0,
      "write_ticks" : 5662,
      "in_queue" : 6085,
      "util" : 1.103687
    }
  ]
}

[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux