Re: [linus:master] [block] e70c301fae: stress-ng.aiol.ops_per_sec 49.6% regression

Niklas Cassel <cassel@xxxxxxxxxx> · Thu, 16 Jan 2025 11:04:36 +0100

On Thu, Jan 16, 2025 at 02:37:08PM +0800, Oliver Sang wrote:
> On Wed, Jan 15, 2025 at 12:42:33PM +0100, Niklas Cassel wrote:
> > 
> > Looking closer at the raw number for stress-ng + none scheduler, in your
> > other email, it seems clear that the raw values from the stress-ng workload
> > can vary quite a lot. In the long run, I wonder if we perhaps can find a
> > workload that has less variation. E.g. fio test for IOPS and fio test for
> > throughout. But perhaps such workloads are already part of lkp-tests?
> 
> yes, we have fio tests [1].
> as in [2], we get it from https://github.com/axboe/fio
> not sure if it's just the fio you mentioned?

Yes, that's the one :)

> 
> our framework is basically automatic. bot merged repo/branches it monitors
> into so-called hourly kernel, then if found performance difference with base,
> bisect will be triggered to capture which commit causes the change.
> 
> due to resource constraint, we cannot allot all testsuites (we have around 80)
> to all platforms, and there are other various reasons which could cause us to
> miss some performance differences.
> 
> if you have interests, could you help check those fio-basic-*.yaml files under
> [3]? if you can spot out the correct case, we could do more tests to check
> e70c301fae and its parent. thanks!
> 
> [1] https://github.com/intel/lkp-tests/tree/master/programs/fio
> [2] https://github.com/intel/lkp-tests/blob/master/programs/fio/pkg/PKGBUILD
> [3] https://github.com/intel/lkp-tests/tree/master/jobs

I'm probably not the best qualified person to review this, would be nice if e.g.
Jens himself (or others block layer folks) could have a look at these.

What I can see is:
https://github.com/intel/lkp-tests/blob/master/jobs/fio-basic-local-disk.yaml

seems to do:
    - randrw

but only on for SSDs, not HDDs, and only on ext4.

https://github.com/intel/lkp-tests/blob/master/jobs/fio-basic-1hdd-write.yaml

does test ext4, btrfs, and xfs,
but it does not do randrw.

What are the thresholds for these tests counting as a regression?
Are you comparing BW, or IOPS, or both?

Looking at:
https://github.com/intel/lkp-tests/blob/master/programs/fio/parse

It seems to produce points for:
bw_MBps
iops
total_ios
clat_mean_ns
clat_stddev
slat_mean_us
slat_stddev
and more.

So it does seem to compare BW, IOPS, total IOs, which is what I was looking
for.

Possibly even too much, as enabling too much logging will actually affect
the results, since you need to write way more output to the logs.

But again, Jens (and other block layer folks) are the experts.

Kind regards,
Niklas