On Tue, Jan 21, 2025 at 07:44:30AM -0500, Theodore Ts'o wrote: > On Tue, Jan 21, 2025 at 02:58:25PM +1100, Dave Chinner wrote: > > > +# Are there mkfs options to try to improve concurrency? > > > +_scratch_mkfs_concurrency_options() > > > +{ > > > + local nr_cpus="$(( $1 * LOAD_FACTOR ))" > > > > caller does not need to pass a number of CPUs. This function can > > simply do: > > > > local nr_cpus=$(getconf _NPROCESSORS_CONF) > > > > And that will set concurrency to be "optimal" for the number of CPUs > > in the machine the test is going to run on. That way tests don't > > need to hard code some number that is going to be too large for > > small systems and to small for large systems... > > Hmm, but is this the right thing if you are using check-parallel? If > you are running multiple tests that are all running some kind of load > or stress-testing antagonist at the same time, then having 3x to 5x > the number of necessary antagonist threads is going to unnecessarily > slow down the test run, which goes against the original goal of what > we were hoping to achieve with check-parallel. <shrug> Maybe a more appropriate thing to do is: local nr_cpus=$(grep Cpus_allowed /proc/self/status | hweight) So a check-parallel could (if they see such problems) constrain the parallelism through cpu pinning. I think getconf _NPROCESSORS_CONF is probably fine for now. (The other day I /did/ see some program in either util-linux or coreutils that told you the number of "available" cpus based on checking the affinity mask and whatever cgroups constraints are applied. I can't find it now, alas...) > How many tests are you currently able to run in parallel today, and > what's the ultimate goal? We could have some kind of antagonist load > which is shared across multiple tests, but it's not clear to me that > it's worth the complexity. (And note that it's not just fs and cpu > load antagonistsw; there could also be memory stress antagonists, where > having multiple antagonists could lead to OOM kills...) On the other hand, perhaps having random antagonistic processes from other ./check instances is exactly the kind of stress testing that we want to shake out weirder bugs? It's clear from Dave's RFC that the generic/650 cpu hotplug shenanigans had some effect. ;) --D > - Ted >