On Wed, Nov 20, 2024 at 08:04:43AM +1100, Dave Chinner wrote: > On Tue, Nov 19, 2024 at 07:45:20AM -0800, Darrick J. Wong wrote: > > On Mon, Nov 18, 2024 at 10:13:23PM -0800, Christoph Hellwig wrote: > > > On Tue, Nov 19, 2024 at 12:45:05PM +1100, Dave Chinner wrote: > > > > Question for you: Does your $here directory contain a .git subdir? > > > > > > > > One of the causes of long runtime for me has been that $here might > > > > only contain 30MB of files, but the .git subdir balloons to several > > > > hundred MB over time, resulting is really long runtimes because it's > > > > copying GBs of data from the .git subdir. > > > > > > Or the results/ directory when run in a persistent test VM like the > > > one for quick runs on my laptop. I currently need to persistently > > > purge that for just this test. > > Yeah, I use persistent VMs and that's why the .git dir grows... > > > > > --- a/tests/generic/251 > > > > +++ b/tests/generic/251 > > > > @@ -175,9 +175,12 @@ nproc=20 > > > > # Copy $here to the scratch fs and make coipes of the replica. The fstests > > > > # output (and hence $seqres.full) could be in $here, so we need to snapshot > > > > # $here before computing file checksums. > > > > +# > > > > +# $here/* as the files to copy so we avoid any .git directory that might be > > > > +# much, much larger than the rest of the fstests source tree we are copying. > > > > content=$SCRATCH_MNT/orig > > > > mkdir -p $content > > > > -cp -axT $here/ $content/ > > > > +cp -ax $here/* $content/ > > > > > > Maybe we just need a way to generate more predictable file system > > > content? > > > > How about running fsstress for ~50000ops or so, to generate some test > > files and directory tree? > > Do we even need to do that? It's a set of small files distributed > over a few directories. There are few large files in the mix, so we > could just create a heap of 1-4 block files across a dozen or so > directories and get the same sort of data set to copy. > > And given this observation, if we are generating the data set in the > first place, why use cp to copy it every time? Why not just have > each thread generate the data set on the fly? run_process compares the copies to the original to try to discover places where written blocks got discarded, so they actually do need to be copies. /me suspects that this test is kinda bogus if the block device doesn't set discard_zeroes_data because it won't trip on discard errors for crappy sata ssds that don't actually clear the remapping tables until minutes later. --D > # create a directory structure with numdirs directories and numfiles > # files per directory. Files are 0-3 blocks in length, space is > # allocated by fallocate to avoid needing to write data. Files are > # created concurrently across directories to create the data set as > # fast as possible. > create_files() > { > local numdirs=$1 > local numfiles=$2 > local basedir=$3 > > for ((i=0; i<$numdirs; i++)); do > mkdir -p $basedir/$i > for ((j=0; j<$numfiles; j++); do > local len=$((RANDOM % 4)) > $XFS_IO_PROG -fc "falloc 0 ${len}b" $basedir/$i/$j > done & > done > wait > } > > -Dave > > -- > Dave Chinner > david@xxxxxxxxxxxxx