On Tue, Jan 21, 2025 at 07:49:44PM -0800, Darrick J. Wong wrote: > On Tue, Jan 21, 2025 at 03:57:23PM +1100, Dave Chinner wrote: > > On Thu, Jan 16, 2025 at 03:28:33PM -0800, Darrick J. Wong wrote: > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > Prior to commit 8973af00ec21, in the absence of an explicit > > > SOAK_DURATION, this test would run 2500 fsstress operations each of ten > > > times through the loop body. On the author's machines, this kept the > > > runtime to about 30s total. Oddly, this was changed to 30s per loop > > > body with no specific justification in the middle of an fsstress process > > > management change. > > > > I'm pretty sure that was because when you run g/650 on a machine > > with 64p, the number of ops performed on the filesystem is > > nr_cpus * 2500 * nr_loops. > > Where does that happen? > > Oh, heh. -n is the number of ops *per process*. Yeah, I just noticed another case of this: Ten slowest tests - runtime in seconds: generic/750 559 generic/311 486 ..... generic/750 does: nr_cpus=$((LOAD_FACTOR * 4)) nr_ops=$((25000 * nr_cpus * TIME_FACTOR)) fsstress_args=(-w -d $SCRATCH_MNT -n $nr_ops -p $nr_cpus) So the actual load factor increase is exponential: Load factor nr_cpus nr_ops total ops 1 4 100k 400k 2 8 200k 1.6M 3 12 300k 3.6M 4 16 400k 6.4M and so on. I suspect that there are other similar cpu scaling issues lurking across the many fsstress tests... > > > On the author's machine, this explodes the runtime from ~30s to 420s. > > > Put things back the way they were. > > > > Yeah, OK, that's exactly waht keep_running() does - duration > > overrides nr_ops. > > > > Ok, so keeping or reverting the change will simply make different > > people unhappy because of the excessive runtime the test has at > > either ends of the CPU count spectrum - what's the best way to go > > about providing the desired min(nr_ops, max loop time) behaviour? > > Do we simply cap the maximum process count to keep the number of ops > > down to something reasonable (e.g. 16), or something else? > > How about running fsstress with --duration=3 if SOAK_DURATION isn't set? > That should keep the runtime to 30 seconds or so even on larger > machines: > > if [ -n "$SOAK_DURATION" ]; then > test "$SOAK_DURATION" -lt 10 && SOAK_DURATION=10 > fsstress_args+=(--duration="$((SOAK_DURATION / 10))") > else > # run for 3s per iteration max for a default runtime of ~30s. > fsstress_args+=(--duration=3) > fi Yeah, that works for me. As a rainy day project, perhaps we should look to convert all the fsstress invocations to be time bound rather than running a specific number of ops. i.e. hard code nr_ops=<some huge number> in _run_fstress_bg() and the tests only need to define parallelism and runtime. This would make the test runtimes more deterministic across machines with vastly different capabilities and and largely make "test xyz is slow on my test machine" reports largely go away. Thoughts? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx