On Wed, Jan 22, 2025 at 03:12:11PM +1100, Dave Chinner wrote: > On Tue, Jan 21, 2025 at 07:49:44PM -0800, Darrick J. Wong wrote: > > On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote: > > > On Thu, Jan 16, 2025 at 03:28:33PM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > Prior to commit 8973af00ec21, in the absence of an explicit > > > > SOAK_DURATION, this test would run 2500 fsstress operations each of ten > > > > times through the loop body. On the author's machines, this kept the > > > > runtime to about 30s total. Oddly, this was changed to 30s per loop > > > > body with no specific justification in the middle of an fsstress process > > > > management change. > > > > > > I'm pretty sure that was because when you run g/650 on a machine > > > with 64p, the number of ops performed on the filesystem is > > > nr_cpus * 2500 * nr_loops. > > > > Where does that happen? > > > > Oh, heh. -n is the number of ops *per process*. > > Yeah, I just noticed another case of this: > > Ten slowest tests - runtime in seconds: > generic/750 559 > generic/311 486 > ..... > > generic/750 does: > > nr_cpus=$((LOAD_FACTOR * 4)) > nr_ops=$((25000 * nr_cpus * TIME_FACTOR)) > fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus) > > So the actual load factor increase is exponential: > > Load factor nr_cpus nr_ops total ops > 1 4 100k 400k > 2 8 200k 1.6M > 3 12 300k 3.6M > 4 16 400k 6.4M > > and so on. > > I suspect that there are other similar cpu scaling issues > lurking across the many fsstress tests... > > > > > On the author's machine, this explodes the runtime from ~30s to 420s. > > > > Put things back the way they were. > > > > > > Yeah, OK, that's exactly waht keep_running() does - duration > > > overrides nr_ops. > > > > > > Ok, so keeping or reverting the change will simply make different > > > people unhappy because of the excessive runtime the test has at > > > either ends of the CPU count spectrum - what's the best way to go > > > about providing the desired min(nr_ops, max loop time) behaviour? > > > Do we simply cap the maximum process count to keep the number of ops > > > down to something reasonable (e.g. 16), or something else? > > > > How about running fsstress with --duration=3 if SOAK_DURATION isn't set? > > That should keep the runtime to 30 seconds or so even on larger > > machines: > > > > if [ -n "$SOAK_DURATION" ]; then > > test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10 > > fsstress_args+=(--duration="$((SOAK_DURATION / 10))") > > else > > # run for 3s per iteration max for a default runtime of ~30s. > > fsstress_args+=(--duration=3) > > fi > > Yeah, that works for me. > > As a rainy day project, perhaps we should look to convert all the > fsstress invocations to be time bound rather than running a specific > number of ops. i.e. hard code nr_ops=<some huge number> in > _run_fstress_bg() and the tests only need to define parallelism and > runtime. I /think/ the only ones that do that are generic/1220 generic/476 generic/642 generic/750. I could drop the nr_cpus term from the nr_ops calculation. > This would make the test runtimes more deterministic across machines > with vastly different capabilities and and largely make "test xyz is > slow on my test machine" reports largely go away. > > Thoughts? I'm fine with _run_fsstress injecting --duration=30 if no other duration argument is passed in. --D > -Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx >