Re: [PATCH 09/12] generic/251: constrain runtime via time/load/soak factors

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 20 Nov 2024 08:04:43 +1100

On Tue, Nov 19, 2024 at 07:45:20AM -0800, Darrick J. Wong wrote:
> On Mon, Nov 18, 2024 at 10:13:23PM -0800, Christoph Hellwig wrote:
> > On Tue, Nov 19, 2024 at 12:45:05PM +1100, Dave Chinner wrote:
> > > Question for you: Does your $here directory contain a .git subdir?
> > > 
> > > One of the causes of long runtime for me has been that $here might
> > > only contain 30MB of files, but the .git subdir balloons to several
> > > hundred MB over time, resulting is really long runtimes because it's
> > > copying GBs of data from the .git subdir.
> > 
> > Or the results/ directory when run in a persistent test VM like the
> > one for quick runs on my laptop.  I currently need to persistently
> > purge that for just this test.

Yeah, I use persistent VMs and that's why the .git dir grows...

> > > --- a/tests/generic/251
> > > +++ b/tests/generic/251
> > > @@ -175,9 +175,12 @@ nproc=20
> > >  # Copy $here to the scratch fs and make coipes of the replica.  The fstests
> > >  # output (and hence $seqres.full) could be in $here, so we need to snapshot
> > >  # $here before computing file checksums.
> > > +#
> > > +# $here/* as the files to copy so we avoid any .git directory that might be
> > > +# much, much larger than the rest of the fstests source tree we are copying.
> > >  content=$SCRATCH_MNT/orig
> > >  mkdir -p $content
> > > -cp -axT $here/ $content/
> > > +cp -ax $here/* $content/
> > 
> > Maybe we just need a way to generate more predictable file system
> > content?
> 
> How about running fsstress for ~50000ops or so, to generate some test
> files and directory tree?

Do we even need to do that? It's a set of small files distributed
over a few directories. There are few large files in the mix, so we
could just create a heap of 1-4 block files across a dozen or so
directories and get the same sort of data set to copy.

And given this observation, if we are generating the data set in the
first place, why use cp to copy it every time? Why not just have
each thread generate the data set on the fly?

# create a directory structure with numdirs directories and numfiles
# files per directory. Files are 0-3 blocks in length, space is
# allocated by fallocate to avoid needing to write data. Files are
# created concurrently across directories to create the data set as
# fast as possible.
create_files()
{
	local numdirs=$1
	local numfiles=$2
	local basedir=$3

	for ((i=0; i<$numdirs; i++)); do
		mkdir -p $basedir/$i
		for ((j=0; j<$numfiles; j++); do
			local len=$((RANDOM % 4))
			$XFS_IO_PROG -fc "falloc 0 ${len}b" $basedir/$i/$j
		done &
	done
	wait
}

-Dave

-- 
Dave Chinner
david@xxxxxxxxxxxxx