Re: [PATCH 10/23] mkfs: don't hardcode log size

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Tue, 21 Jan 2025 19:36:25 -0800

On Tue, Jan 21, 2025 at 07:44:30AM -0500, Theodore Ts'o wrote:
> On Tue, Jan 21, 2025 at 02:58:25PM +1100, Dave Chinner wrote:
> > > +# Are there mkfs options to try to improve concurrency?
> > > +_scratch_mkfs_concurrency_options()
> > > +{
> > > +	local nr_cpus="$(( $1 * LOAD_FACTOR ))"
> > 
> > caller does not need to pass a number of CPUs. This function can
> > simply do:
> > 
> > 	local nr_cpus=$(getconf _NPROCESSORS_CONF)
> > 
> > And that will set concurrency to be "optimal" for the number of CPUs
> > in the machine the test is going to run on. That way tests don't
> > need to hard code some number that is going to be too large for
> > small systems and to small for large systems...
> 
> Hmm, but is this the right thing if you are using check-parallel?  If
> you are running multiple tests that are all running some kind of load
> or stress-testing antagonist at the same time, then having 3x to 5x
> the number of necessary antagonist threads is going to unnecessarily
> slow down the test run, which goes against the original goal of what
> we were hoping to achieve with check-parallel.

<shrug> Maybe a more appropriate thing to do is:

	local nr_cpus=$(grep Cpus_allowed /proc/self/status | hweight)

So a check-parallel could (if they see such problems) constrain the
parallelism through cpu pinning.  I think getconf _NPROCESSORS_CONF is
probably fine for now.

(The other day I /did/ see some program in either util-linux or
coreutils that told you the number of "available" cpus based on checking
the affinity mask and whatever cgroups constraints are applied.  I can't
find it now, alas...)

> How many tests are you currently able to run in parallel today, and
> what's the ultimate goal?  We could have some kind of antagonist load
> which is shared across multiple tests, but it's not clear to me that
> it's worth the complexity.  (And note that it's not just fs and cpu
> load antagonistsw; there could also be memory stress antagonists, where
> having multiple antagonists could lead to OOM kills...)

On the other hand, perhaps having random antagonistic processes from
other ./check instances is exactly the kind of stress testing that we
want to shake out weirder bugs?  It's clear from Dave's RFC that the
generic/650 cpu hotplug shenanigans had some effect. ;)

--D

> 							- Ted
>