Re: [PATCH 1/3] populate: fix horrible performance due to excessive forking

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Tue, Jan 10, 2023 at 10:02:37PM -0800, Darrick J. Wong wrote:
> On Wed, Jan 11, 2023 at 09:49:04AM +1100, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > 
> > xfs/155 is taking close on 4 minutes to populate the filesystem,
> > and most of that is because the populate functions are coded without
> > consideration of performance.
> > 
> > Most of the operations can be executed in parallel as the operate on
> > separate files or in separate directories.
> > 
> > Creating a zero length file in a shell script can be very fast if we
> > do the creation within the shell, but running touch, xfs_io or some
> > other process to create the file is extremely slow - performance is
> > limited by the process creation/destruction rate, not the filesystem
> > create rate. Same goes for unlinking files.
> > 
> > We can use 'echo -n > $file' to create or truncate an existing file
> > to zero length from within the shell. This is much, much faster than
> > calling touch.
> > 
> > For removing lots of files, there is no shell built in to do this
> > without forking, but we can easily build a file list and pipe it
> > to 'xargs rm -f' to execute rm with as many files as possible in one
> > execution.
> > 
> > Doing this removes approximately 50,000 process creat/destroy cycles
> > to populate the filesystem, reducing system time from ~200s to ~35s
> > to populate the filesystem. Along with running operations in
> > parallel, this brings the population time down from ~235s to less
> > than 45s.
> 
> Hmm.  I took the nerdsnipe bait and came up with my own approach.  I
> replaced the shell loops with a perl script.  I didn't parallelize
> anything, but the perl script cut the runtime down to about ~35s.
> 
> > The long tail of that 45s runtime time is the btree format attribute
> > tree create. That executes setfattr a very large number of times,
> > taking 44s to run and consuming 36s of system time mostly just
> > creating and destroying thousands of setfattr process contexts.
> > There's no easy shell coding solution to that issue, so that's for
> > another rainy day.
> 
> ...well it's pouring on the west coast here, so I'll post my solution
> that uses setfattr --restore tomorrow when I get it back from QA.
> Granted, I hadn't found a solution to the removexattr stuff yet, so I
> might keep working on that.
> 
> (removexattr looks like a pain in perl though...)
> 
> Anyway it's late now, I'll look at the diff tomorrow.

...or thursday now, since I decided to reply to the online fsck design
doc review comments, which took most of the workday.  I managed to bang
out a python script (perl doesn't support setxattr!) that cut the xattr
overhead down to nearly zero.

--D

> --D
> 
> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> > ---
> >  common/populate | 179 ++++++++++++++++++++++++++++--------------------
> >  1 file changed, 104 insertions(+), 75 deletions(-)
> > 
> > diff --git a/common/populate b/common/populate
> > index 44b4af166..9b60fa5c1 100644
> > --- a/common/populate
> > +++ b/common/populate
> > @@ -52,23 +52,64 @@ __populate_fragment_file() {
> >  	test -f "${fname}" && $here/src/punch-alternating "${fname}"
> >  }
> >  
> > -# Create a large directory
> > -__populate_create_dir() {
> > -	name="$1"
> > -	nr="$2"
> > -	missing="$3"
> > +# Create a specified number of files or until the maximum extent count is
> > +# reached. If the extent count is reached, return the number of files created.
> > +# This is optimised for speed - do not add anything that executes a separate
> > +# process in every loop as this will slow it down by a factor of at least 5.
> > +__populate_create_nfiles() {
> > +	local name="$1"
> > +	local nr="$2"
> > +	local max_nextents="$3"
> > +	local d=0
> >  
> >  	mkdir -p "${name}"
> > -	seq 0 "${nr}" | while read d; do
> > -		creat=mkdir
> > -		test "$((d % 20))" -eq 0 && creat=touch
> > -		$creat "${name}/$(printf "%.08d" "$d")"
> > +	for d in `seq 0 "${nr}"`; do
> > +		local fname=""
> > +		printf -v fname "${name}/%.08d" "$d"
> > +
> > +		if [ "$((d % 20))" -eq 0 ]; then
> > +			mkdir ${fname}
> > +		else
> > +			echo -n > ${fname}
> > +		fi
> > +
> > +		if [ "${max_nextents}" -eq 0 ]; then
> > +			continue
> > +		fi
> > +		if [ "$((d % 40))" -ne 0 ]; then
> > +			continue
> > +		fi
> > +
> > +		local nextents="$(_xfs_get_fsxattr nextents $name)"
> > +		if [ "${nextents}" -gt "${max_nextents}" ]; then
> > +			echo ${d}
> > +			break
> > +		fi
> >  	done
> > +}
> > +
> > +# remove every second file in the given directory. This is optimised for speed -
> > +# do not add anything that executes a separate process in each loop as this will
> > +# slow it down by at least factor of 10.
> > +__populate_remove_nfiles() {
> > +	local name="$1"
> > +	local nr="$2"
> > +	local d=1
> > +
> > +	for d in `seq 1 2 "${nr}"`; do
> > +		printf "${name}/%.08d " "$d"
> > +	done | xargs rm -f
> > +}
> >  
> > +# Create a large directory
> > +__populate_create_dir() {
> > +	local name="$1"
> > +	local nr="$2"
> > +	local missing="$3"
> > +
> > +	__populate_create_nfiles "${name}" "${nr}" 0
> >  	test -z "${missing}" && return
> > -	seq 1 2 "${nr}" | while read d; do
> > -		rm -rf "${name}/$(printf "%.08d" "$d")"
> > -	done
> > +	__populate_remove_nfiles "${name}" "${nr}"
> >  }
> >  
> >  # Create a large directory and ensure that it's a btree format
> > @@ -82,31 +123,18 @@ __populate_xfs_create_btree_dir() {
> >  	# watch for when the extent count exceeds the space after the
> >  	# inode core.
> >  	local max_nextents="$(((isize - icore_size) / 16))"
> > -	local nr=0
> > -
> > -	mkdir -p "${name}"
> > -	while true; do
> > -		local creat=mkdir
> > -		test "$((nr % 20))" -eq 0 && creat=touch
> > -		$creat "${name}/$(printf "%.08d" "$nr")"
> > -		if [ "$((nr % 40))" -eq 0 ]; then
> > -			local nextents="$(_xfs_get_fsxattr nextents $name)"
> > -			[ $nextents -gt $max_nextents ] && break
> > -		fi
> > -		nr=$((nr+1))
> > -	done
> > +	local nr=100000
> >  
> > +	nr=$(__populate_create_nfiles "${name}" "${nr}" "${max_nextents}")
> >  	test -z "${missing}" && return
> > -	seq 1 2 "${nr}" | while read d; do
> > -		rm -rf "${name}/$(printf "%.08d" "$d")"
> > -	done
> > +	__populate_remove_nfiles "${name}" "${nr}"
> >  }
> >  
> >  # Add a bunch of attrs to a file
> >  __populate_create_attr() {
> > -	name="$1"
> > -	nr="$2"
> > -	missing="$3"
> > +	local name="$1"
> > +	local nr="$2"
> > +	local missing="$3"
> >  
> >  	touch "${name}"
> >  	seq 0 "${nr}" | while read d; do
> > @@ -121,17 +149,18 @@ __populate_create_attr() {
> >  
> >  # Fill up some percentage of the remaining free space
> >  __populate_fill_fs() {
> > -	dir="$1"
> > -	pct="$2"
> > +	local dir="$1"
> > +	local pct="$2"
> > +	local nr=0
> >  	test -z "${pct}" && pct=60
> >  
> >  	mkdir -p "${dir}/test/1"
> >  	cp -pRdu "${dir}"/S_IFREG* "${dir}/test/1/"
> >  
> > -	SRC_SZ="$(du -ks "${dir}/test/1" | cut -f 1)"
> > -	FS_SZ="$(( $(stat -f "${dir}" -c '%a * %S') / 1024 ))"
> > +	local SRC_SZ="$(du -ks "${dir}/test/1" | cut -f 1)"
> > +	local FS_SZ="$(( $(stat -f "${dir}" -c '%a * %S') / 1024 ))"
> >  
> > -	NR="$(( (FS_SZ * ${pct} / 100) / SRC_SZ ))"
> > +	local NR="$(( (FS_SZ * ${pct} / 100) / SRC_SZ ))"
> >  
> >  	echo "FILL FS"
> >  	echo "src_sz $SRC_SZ fs_sz $FS_SZ nr $NR"
> > @@ -220,45 +249,45 @@ _scratch_xfs_populate() {
> >  	# Data:
> >  
> >  	# Fill up the root inode chunk
> > -	echo "+ fill root ino chunk"
> > +	( echo "+ fill root ino chunk"
> >  	seq 1 64 | while read f; do
> > -		$XFS_IO_PROG -f -c "truncate 0" "${SCRATCH_MNT}/dummy${f}"
> > -	done
> > +		echo -n > "${SCRATCH_MNT}/dummy${f}"
> > +	done ) &
> >  
> >  	# Regular files
> >  	# - FMT_EXTENTS
> >  	echo "+ extents file"
> > -	__populate_create_file $blksz "${SCRATCH_MNT}/S_IFREG.FMT_EXTENTS"
> > +	__populate_create_file $blksz "${SCRATCH_MNT}/S_IFREG.FMT_EXTENTS" &
> >  
> >  	# - FMT_BTREE
> >  	echo "+ btree extents file"
> >  	nr="$((blksz * 2 / 16))"
> > -	__populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/S_IFREG.FMT_BTREE"
> > +	__populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/S_IFREG.FMT_BTREE" &
> >  
> >  	# Directories
> >  	# - INLINE
> > -	echo "+ inline dir"
> > -	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_INLINE" 1
> > +	 echo "+ inline dir"
> > +	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_INLINE" 1 "" &
> >  
> >  	# - BLOCK
> >  	echo "+ block dir"
> > -	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BLOCK" "$((dblksz / 40))"
> > +	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BLOCK" "$((dblksz / 40))" "" &
> >  
> >  	# - LEAF
> >  	echo "+ leaf dir"
> > -	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_LEAF" "$((dblksz / 12))"
> > +	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_LEAF" "$((dblksz / 12))" "" &
> >  
> >  	# - LEAFN
> >  	echo "+ leafn dir"
> > -	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_LEAFN" "$(( ((dblksz - leaf_hdr_size) / 8) - 3 ))"
> > +	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_LEAFN" "$(( ((dblksz - leaf_hdr_size) / 8) - 3 ))" "" &
> >  
> >  	# - NODE
> >  	echo "+ node dir"
> > -	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_NODE" "$((16 * dblksz / 40))" true
> > +	__populate_create_dir "${SCRATCH_MNT}/S_IFDIR.FMT_NODE" "$((16 * dblksz / 40))" true &
> >  
> >  	# - BTREE
> >  	echo "+ btree dir"
> > -	__populate_xfs_create_btree_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BTREE" "$isize" true
> > +	__populate_xfs_create_btree_dir "${SCRATCH_MNT}/S_IFDIR.FMT_BTREE" "$isize" true &
> >  
> >  	# Symlinks
> >  	# - FMT_LOCAL
> > @@ -280,20 +309,20 @@ _scratch_xfs_populate() {
> >  
> >  	# Attribute formats
> >  	# LOCAL
> > -	echo "+ local attr"
> > -	__populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_LOCAL" 1
> > +	 echo "+ local attr"
> > +	__populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_LOCAL" 1 "" &
> >  
> >  	# LEAF
> > -	echo "+ leaf attr"
> > -	__populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_LEAF" "$((blksz / 40))"
> > +	 echo "+ leaf attr"
> > +	__populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_LEAF" "$((blksz / 40))" "" &
> >  
> >  	# NODE
> >  	echo "+ node attr"
> > -	__populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_NODE" "$((8 * blksz / 40))"
> > +	__populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_NODE" "$((8 * blksz / 40))" "" &
> >  
> >  	# BTREE
> >  	echo "+ btree attr"
> > -	__populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_BTREE" "$((64 * blksz / 40))" true
> > +	__populate_create_attr "${SCRATCH_MNT}/ATTR.FMT_BTREE" "$((64 * blksz / 40))" true &
> >  
> >  	# trusted namespace
> >  	touch ${SCRATCH_MNT}/ATTR.TRUSTED
> > @@ -321,68 +350,68 @@ _scratch_xfs_populate() {
> >  	rm -rf "${SCRATCH_MNT}/attrvalfile"
> >  
> >  	# Make an unused inode
> > -	echo "+ empty file"
> > +	( echo "+ empty file"
> >  	touch "${SCRATCH_MNT}/unused"
> >  	$XFS_IO_PROG -f -c 'fsync' "${SCRATCH_MNT}/unused"
> > -	rm -rf "${SCRATCH_MNT}/unused"
> > +	rm -rf "${SCRATCH_MNT}/unused" ) &
> >  
> >  	# Free space btree
> >  	echo "+ freesp btree"
> >  	nr="$((blksz * 2 / 8))"
> > -	__populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/BNOBT"
> > +	__populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/BNOBT" &
> >  
> >  	# Inode btree
> > -	echo "+ inobt btree"
> > +	( echo "+ inobt btree"
> >  	local ino_per_rec=64
> >  	local rec_per_btblock=16
> >  	local nr="$(( 2 * (blksz / rec_per_btblock) * ino_per_rec ))"
> >  	local dir="${SCRATCH_MNT}/INOBT"
> > -	mkdir -p "${dir}"
> > -	seq 0 "${nr}" | while read f; do
> > -		touch "${dir}/${f}"
> > -	done
> > -
> > -	seq 0 2 "${nr}" | while read f; do
> > -		rm -f "${dir}/${f}"
> > -	done
> > +	__populate_create_dir "${SCRATCH_MNT}/INOBT" "${nr}" true
> > +	) &
> >  
> >  	# Reverse-mapping btree
> >  	is_rmapbt="$(_xfs_has_feature "$SCRATCH_MNT" rmapbt -v)"
> >  	if [ $is_rmapbt -gt 0 ]; then
> > -		echo "+ rmapbt btree"
> > +		( echo "+ rmapbt btree"
> >  		nr="$((blksz * 2 / 24))"
> >  		__populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/RMAPBT"
> > +		) &
> >  	fi
> >  
> >  	# Realtime Reverse-mapping btree
> >  	is_rt="$(_xfs_get_rtextents "$SCRATCH_MNT")"
> >  	if [ $is_rmapbt -gt 0 ] && [ $is_rt -gt 0 ]; then
> > -		echo "+ rtrmapbt btree"
> > +		( echo "+ rtrmapbt btree"
> >  		nr="$((blksz * 2 / 32))"
> >  		$XFS_IO_PROG -R -f -c 'truncate 0' "${SCRATCH_MNT}/RTRMAPBT"
> >  		__populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/RTRMAPBT"
> > +		) &
> >  	fi
> >  
> >  	# Reference-count btree
> >  	is_reflink="$(_xfs_has_feature "$SCRATCH_MNT" reflink -v)"
> >  	if [ $is_reflink -gt 0 ]; then
> > -		echo "+ reflink btree"
> > +		( echo "+ reflink btree"
> >  		nr="$((blksz * 2 / 12))"
> >  		__populate_create_file $((blksz * nr)) "${SCRATCH_MNT}/REFCOUNTBT"
> >  		cp --reflink=always "${SCRATCH_MNT}/REFCOUNTBT" "${SCRATCH_MNT}/REFCOUNTBT2"
> > +		) &
> >  	fi
> >  
> >  	# Copy some real files (xfs tests, I guess...)
> >  	echo "+ real files"
> >  	test $fill -ne 0 && __populate_fill_fs "${SCRATCH_MNT}" 5
> >  
> > -	# Make sure we get all the fragmentation we asked for
> > -	__populate_fragment_file "${SCRATCH_MNT}/S_IFREG.FMT_BTREE"
> > -	__populate_fragment_file "${SCRATCH_MNT}/BNOBT"
> > -	__populate_fragment_file "${SCRATCH_MNT}/RMAPBT"
> > -	__populate_fragment_file "${SCRATCH_MNT}/RTRMAPBT"
> > -	__populate_fragment_file "${SCRATCH_MNT}/REFCOUNTBT"
> > +	# Wait for all file creation to complete before we start fragmenting
> > +	# the files as needed.
> > +	wait
> > +	__populate_fragment_file "${SCRATCH_MNT}/S_IFREG.FMT_BTREE" &
> > +	__populate_fragment_file "${SCRATCH_MNT}/BNOBT" &
> > +	__populate_fragment_file "${SCRATCH_MNT}/RMAPBT" &
> > +	__populate_fragment_file "${SCRATCH_MNT}/RTRMAPBT" &
> > +	__populate_fragment_file "${SCRATCH_MNT}/REFCOUNTBT" &
> >  
> > +	wait
> >  	umount "${SCRATCH_MNT}"
> >  }
> >  
> > -- 
> > 2.38.1
> > 



[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux